How to run a safe first AI pilot in security, safety or resilience

Many organisations are now interested in AI, but far fewer are clear on how to start well.

That is especially true in security, safety and resilience. These are not environments where it is sensible to begin with big claims, vague experimentation or uncontrolled rollout. The consequences are often too high, the workflows too important, and the cost of confusion too real.

Team planning a structured first AI pilot for a security, safety or resilience workflow in a modern office setting

That does not mean teams should avoid AI. It means the first step should be practical, limited and safe. A good first pilot is not about proving that AI can do everything. It is about testing whether it can help with a specific operational task in a way that is genuinely useful, controlled and workable in practice.

A simple starting scenario

A head of safety wants to explore whether AI could help with RAMS review. The team handles a high volume of documents from contractors. Reviews are repetitive, time-consuming and important. Some submissions are strong. Some are incomplete. Some contain gaps, weak controls or inconsistencies that require back-and-forth. The process takes time from already busy people and can slow activity on the ground.

The organisation is interested in AI, but cautious. It does not want a tool that makes decisions on its ownand introduce new risk. It wants to know whether AI can help reviewers work more quickly and consistently without weakening standards. That is exactly the kind of problem a first pilot should focus on.

The safest first pilots are narrow and practical

A first pilot works best when it starts with a clear operational task rather than a broad ambition. “Let’s use AI in the control room” is too vague. “Let’s test whether AI can help us assess RAMS against our expected standards, highlight missing content and produce a draft review summary for human checking” is much better. The difference matters because a safe pilot needs borders.

It should be clear what task is being tested, what documents or data are in scope, what good performance looks like, where human review remains essential, and what the organisation will do with the output.

The narrower the task, the easier it is to assess whether the pilot is genuinely helping.

Start with a problem that already exists

The best first pilots usually begin with a workflow that is already painful, repetitive or slow.

That might include:

reviewing RAMS or contractor submissions
retrieving the right procedure under pressure
triaging incoming reports or messages
producing first-draft summaries or handovers
identifying repeat issues across incidents or records

These are often good candidates because they involve clear operational work, visible friction and measurable value. They also allow organisations to test AI in a way that supports people rather than trying to replace judgement altogether.

A first pilot should solve a real problem that operators already recognise, not a problem created by the technology.

Choose a task with clear limits

Not every use case is a good place to start.

A safe first pilot should involve a task that can be limited sensibly. That usually means the:

inputs can be defined
source material can be approved
workflow can be described clearly
outputs can be checked by humans
consequences of error are manageable within the pilot design

This is one reason document evaluation is often a stronger first step than more open-ended use cases. It is easier to define what is being assessed, what standard is being used, what the output should look like, and how human review fits in.

By contrast, a broad ambition such as “let AI help us manage incidents” may sound exciting, but can be too loose for a first controlled test.

Decide what “safe” means before you begin

A safe pilot is not just a small pilot. It is a pilot with clear rules. Before starting, organisations should be able to answer some basic questions.

What information is the AI allowed to use?
What information is it not allowed to use?
Is the source material approved and current?
Who reviews the output before action is taken?
What kinds of output are acceptable?
What kinds of decisions remain fully human?
How will errors, uncertainty or weak outputs be handled?
What record will be kept of what the system produced and how it was used?

These questions matter because the main risk in early AI adoption is often not dramatic failure. It is the quieter risk of ambiguity. People start using a tool without clear boundaries, trust grows unevenly, and the organisation cannot properly explain what the system should or should not have been doing.

Keep the first pilot away from open-ended autonomy

A first pilot should not aim for autonomy. That does not mean the system cannot be useful. It means the goal should be support, not substitution.

In higher-consequence environments, a safer starting point is usually one where AI helps people retrieve, assess, sort, summarise or draft, while human operators remain responsible for review and decisions.

That allows the organisation to test usefulness without creating unnecessary exposure. The question is not whether AI can produce an answer on its own. The question is whether it helps the team do the work better while staying within acceptable operational limit.

Measure more than speed

Many organisations focus first on time saved. That matters, but it is not enough on its own. A good first pilot should also examine whether the system improves consistency, visibility and usability.

For example:

Does it help people find relevant material faster?
Does it reduce missed issues in repetitive review work?
Does it make triage more structured?
Does it help produce clearer records?
Does it reduce avoidable back-and-forth?
Does it highlight document gaps or recurring weak points?
Does it help supervisors or reviewers see why an output was produced?

These measures are often more useful than a single headline claim about efficiency. In operational environments, value often comes from a combination of speed, consistency, defensibility and reduced friction.

Test it in the conditions where the work actually happens

A pilot should not only be tested in ideal conditions. If the workflow normally involves time pressure, inconsistent inputs, incomplete submissions or messy operational language, the pilot should encounter those realities early. Otherwise, the organisation may approve something that works in a demo but struggles in live practice.

That does not mean throwing the tool straight into full production. It means the test should be realistic enough to show whether the system helps with the actual job, not just a simplified version of it.

This is one reason operational users need to be involved. A pilot designed only by technical teams or innovation teams may miss the practical detail that determines whether the workflow is usable.

Involve the people who will actually use it

A safe first pilot should not be designed around abstract enthusiasm for AI. It should involve the people who understand the workflow, the pain points, the acceptable standards and the failure modes. In most organisations, that means the future users, the operational owner, and the people responsible for oversight or assurance.

Their role is not just to approve the idea. It is to help define what good looks like, what weak output looks like, what should trigger escalation, and what would make the pilot credible enough to consider further.

This matters because operational trust is not built by slogans. It is built when people can see that the system is shaped around the real task.

Use approved information, not a loose mix of sources

One of the easiest ways to create risk in a first pilot is to be careless about source material. If the system is expected to support a real workflow, it should work from the documents, criteria and information the organisation is prepared to defend. Otherwise, the output may be hard to trust even when it sounds plausible.

For a first pilot, this usually means limiting the AI to a defined set of approved material. That could include internal procedures, review criteria, accepted templates, specific historical records, or a limited document library. It may also mean excluding sources that are incomplete, outdated or too loosely governed for the task.

A pilot becomes safer when the evidence base is clearer.

Make human review visible, not assumed

A common mistake is to say that people will remain “in the loop” without defining what that means. A safer pilot makes the review points explicit.

Who checks the output?
What are they checking for?
At what point does the output become usable?
Can the reviewer see the source basis for what was produced?
What happens if the system is uncertain, incomplete or inconsistent?

Human review should not be treated as a vague reassurance. It should be designed into the workflow.

That is especially important in security, safety and resilience, where the organisation may later need to explain how a tool was used and what controls surrounded it.

Learn from the pilot, not just from the headline result

A first pilot is useful even if it does not lead immediately to wider deployment. Sometimes the main value is in learning where documents are unclear, where standards are inconsistent, where teams use different language for the same issue, or where processes are weaker than expected.

AI pilots often reveal workflow problems, document gaps and decision bottlenecks that existed long before the pilot began. A good organisation learns from that. It does not only ask, “Did the AI work?” It also asks, “What did this show us about the way we work now?”

What a good first pilot often looks like

In practice, a strong first pilot often has the following characteristics:

one clearly defined use case
one definfed workflow
one approved evidence base
a clear human review step
realistic test cases
agreed success criteria
a usable record of outputs and findings
a practical decision at the end about next steps

That is a much better starting point than trying to roll out a broad AI capability and hoping governance will catch up later.

Where SIRV AI fits

SIRV AI is designed for teams that need practical, controlled support in real operational work.

That includes use cases such as document evaluation, procedure retrieval, triage, operational picture building and lessons learned. It sits within an operational layer that helps place limits around evidence, workflow, traceability, memory and review.

For a first pilot, that important because it helps organisations test AI in a limited and operationally useful way. Instead of starting with a general-purpose tool and trying to impose control afterwards, teams can begin with a specific workflow and clearer conditions for safe use.

Conclusion

A safe first AI pilot in security, safety or resilience should not try to prove everything. It should focus on one real task, one clear workflow and one controlled test of whether AI can help people do operational work better.

That usually means starting with a defined use case, using approved information, keeping human review visible, and measuring value in more than one way.

The aim is not to create an impressive demo. It is to learn whether the system can be useful in practice, without losing control of the work that matters. If that foundation is strong, wider adoption becomes far easier to judge.

Run a practical SIRV AI sprint

Start with one defined workflow, clear review points and a realistic test of where AI can help your team operate with more speed, clarity and control.

Explore the SIRV AI sprint

Frequently asked questions

FAQ 1

What makes a good first AI pilot in security, safety or resilience?
A good first AI pilot focuses on one clear operational task, uses approved information, keeps human review visible, and tests whether AI helps in a practical and controlled way.

FAQ 2

What is the safest way to start using AI in a higher-consequence environment?
The safest way to start is with a limited use case such as document evaluation, procedure retrieval or triage, rather than open-ended automation. The task, source material, review steps and success criteria should all be clearly defined.

FAQ 3

Should a first AI pilot aim for automation?
Usually no. A first pilot is better used to test support tasks such as retrieval, assessment, summarisation or triage while people remain responsible for review and decisions.

FAQ 4

What should be measured in a first AI pilot?
Time saved matters, but it should not be the only measure. Teams should also assess consistency, usability, visibility, reduction in avoidable rework, and whether outputs are easier to review and support.

FAQ 5

What are good first AI use cases for security, safety or resilience teams?
Good first use cases often include RAMS review, SOP retrieval, incident triage, first-draft summaries, handovers, and pattern spotting across repeat incidents or records.

FAQ 6

Why is human review important in an AI pilot?
Human review helps ensure outputs are checked before action is taken, especially where the workflow matters operationally. It also makes the limits of AI use clearer and more defensible.

Author bio: Andrew Tollinton

Andrew Tollinton Founder SIRV and author

Andrew Tollinton is CEO and Co-Founder of SIRV, which builds operational AI for safety, security and resilience teams. He focuses on practical, controlled AI use in serious environments, with particular interest in evidence, accountability and human judgement. Andrew chairs the Institute of Strategic Risk Management’s AI in Risk Management Special Interest Group and speaks regularly on AI governance and operational resilience.

Learn more about

Product

SIRV AI

Maps & Visualisations

Internal reports

Run a trial

Industries

Pharma & Life Sciences

Transport & Infrastructure

Venues

Financial services

Trust

Security & privacy

How SIRV measures impact

Company

"SIRV helped us move beyond basic reporting into a system that actively supports decision-making". Les O'Gorman, Director of Facilities, UCB - Pharma and Life Sciences

Privacy LinkedIn Medium Cookies Accessibility Terms of use