What human factors research already tells us about AI overreliance

When organisations talk about AI risk, they often focus on the technology itself. They ask whether the model is accurate enough, whether it hallucinates, whether it is secure, or whether it can be trusted with a particular task. Those are valid questions, but they are not the whole picture.

Operational reviewer checking AI-assisted output against source material in a modern office environment

There is another issue that matters just as much in real operations: what happens to people when they start relying on a system too much.

Human factors research has been studying versions of this problem for years. Long before generative AI, researchers were looking at how people behave around automated systems, decision aids and machine support. Again and again, the lesson was similar. Even useful systems can create new risks if people begin to trust them too easily, check them less carefully, or lose confidence in their own judgement.

That matters now because AI is arriving in environments where speed, clarity and judgement all matter. If teams want AI to be useful over time, they need to understand not only what the system can do, but what it may gradually change in the people using it.

A familiar operational scenario

A reviewer has spent months using AI support for RAMS checks. At first, they read the underlying document carefully and compare the AI output against it. They spot when the summary has missed something. They correct weak points. They treat the system as support. Over time, the pattern changes.

The tool is usually helpful. Most outputs look plausible. The reviewer is busy. They begin by checking the AI summary first and only go back to the document if something looks wrong. Then they start checking less thoroughly. Eventually, the AI output becomes the main lens through which the document is read.

Nothing dramatic has happened. There has been no obvious system failure. But the reviewer’s relationship with the work has shifted – their AI vigilance has dropped. This is the kind of change human factors research has warned about for a long time.

Overreliance is not the same as system failure

One reason overreliance is easy to miss is that it does not always look like a dramatic error. Sometimes the system gives the wrong answer and the harm is obvious. But often the bigger risk is slower and quieter. People become less likely to question an output. They check less deeply. They stop exploring alternatives. They accept the machine’s framing of the task. In time, the system may change not just what people do, but how they think while doing it.

That is why overreliance should not be treated as a separate issue from performance. A tool can be useful overall and still create a pattern of use that weakens judgement in the long run.

This is especially important in safety, security and resilience work, where people are often dealing with uncertainty rather than neat right-or-wrong problems.

Human factors research has seen this pattern before

The exact technologies change, but the underlying behavioural pattern is familiar. When people work with automated aids, several recurring risks appear.

Automation bias. This is the tendency to favour the machine-supported answer and give it too much weight, even when it is wrong or incomplete.

Complacency. When a system appears reliable enough, people may reduce monitoring and checking, especially when they are busy or under pressure.

Skill fade. If a system repeatedly performs part of a task for us, our own ability to do that task sharply and confidently can weaken over time.

Out-of-the-loop performance. When people supervise a system rather than doing the task directly, they can become less ready to step in well when something unusual happens.

These ideas did not begin with generative AI. But AI may intensify them because it produces fluent, plausible and often persuasive outputs. It does not just automate routine actions. It can also shape how a user interprets a problem.

Why fluent AI creates a special risk

Generative AI can feel more trustworthy than many earlier systems because it explains itself in natural language. It does not only give an alert or a score. It produces a summary, a recommendation, a draft argument, or an apparently reasoned answer. That makes it easier to work with, but it can also make it easier to over-trust.

People are often not judging the output only on technical accuracy. They are responding to fluency, coherence and speed. A polished answer can feel more complete than it really is. An uncertain output can sound more settled than it should. A partial summary can become the frame that guides all later review.

This is why AI overreliance is not only about the model being wrong. It is also about the user being drawn too quickly into accepting the model’s version of the problem.

Operational pressure makes overreliance more likely

Most teams do not use AI in calm laboratory conditions. They use it while busy, interrupted and under time pressure. They use it when inboxes are full, decisions are stacking up, incidents are moving, and multiple tasks are competing for attention.

That matters because pressure changes behaviour. When people are overloaded, they are more likely to use the system as a shortcut. They may not do this carelessly. In fact, they may do it because they are trying to keep the operation moving. The more helpful the tool seems, the easier it becomes to lean on it.

This is one reason overreliance is not just an individual weakness. It is often a workflow issue. If the operating environment rewards speed without preserving enough room for checking and judgement, even sensible people can become too dependent on a tool.

The risk is not just bad answers. It is weaker judgement

It is tempting to think the solution is simply to improve model quality. Better performance matters, but it does not solve the whole problem.

Even a strong system can change how users approach a task. If it becomes the default source of framing, sorting or interpretation, people may begin to lose practice in doing those things well themselves. They may still be involved, but in a shallower way.

That is why the real concern is often not just whether the AI occasionally gets something wrong. It is whether the user’s judgement becomes thinner, slower or less questioning over time.

In operational settings, that matters a great deal. The hardest situations are often the ones where the pattern does not fit the norm, the evidence is incomplete, or the machine’s prior framing is exactly what needs to be challenged.

Review is not enough if review becomes passive

Many organisations respond to AI risk by saying a human will review the output. That is necessary, but it is not always sufficient.

A weak review step can become little more than a formal comfort. If the human is mostly confirming outputs that already look plausible, then review may exist in name without adding much real protection. The problem is not only whether a person is present. It is whether they are still engaging actively with the task.

This is where overreliance becomes operationally important. A review process that looks sound on paper may still fail if the human role has quietly drifted from judgement to passive acceptance. The challenge is to design human involvement so that it remains meaningful.

What this means for AI in safety, security and resilience

In these environments, the aim should not be to keep people away from AI. It should be to use AI in ways that support performance without quietly weakening judgement.

That means thinking carefully about where the system helps, where it should not lead, and where people still need to do enough direct engagement with the task to preserve their own skill and discernment.

For example, an AI system may help surface likely issues in a document, but the workflow may still require full manual review of selected cases. It may help retrieve the right procedure, but not become the sole basis for operational interpretation. It may help with triage, but still require active challenge and escalation judgement from experienced users.

The point is not to slow everything down unnecessarily. The point is to make sure that support does not turn into dependency.

Signs that AI use may be weakening judgement

Organisations should watch for practical signals that overreliance may be growing. These may include:

users checking source material less often
reviewers treating AI summaries as the main version of the task
declining confidence in manual judgement without tool support
fewer challenges to plausible-looking outputs
weaker performance when the system is unavailable
routine acceptance of outputs without clear understanding of how they were formed
reduced curiosity about uncertainty, alternatives or missing information

These are not always easy to measure, but they are worth paying attention to. They may reveal a deeper issue than occasional output error.

What good design looks like in response

Human factors research does not point to a simple ban on automation. It points to better design. In practice, that often means:

making source evidence visible
showing uncertainty where it exists
avoiding false impressions of completeness
designing workflows where human review remains active
keeping limits clear on what the system is and is not for
preserving opportunities for people to exercise judgement directly
monitoring not just output quality, but how patterns of use are changing over time

This is one reason the operational layer matters. A prompt can produce an answer. An operational layer helps shape how that answer is produced, what it is based on, when it can be used, how it is reviewed, and how use is monitored over time. That is not just governance in the abstract. It is part of protecting judgement in practice.

A useful analogy: muscle that weakens if never used

One way to think about this is through physical training. Modern life made movement easier to avoid. Cars, lifts, screens and convenience removed effort from daily life. That was useful in many ways, but it also created a problem. If movement keeps disappearing from ordinary routines, people may need deliberate exercise simply to maintain basic capability.

Something similar may happen with judgement.

If AI makes parts of reasoning, checking and interpretation easier to avoid, then organisations may need deliberate ways to keep those abilities active. Otherwise, the system may gradually reduce the very human strengths it was meant to support.

This does not mean rejecting assistance. It means designing for continued capability.

Why this supports the case for operational AI

This is one reason the difference between generic AI and operational AI matters.

A generic tool may be able to answer questions fluently, but it does not necessarily help an organisation manage overreliance. It may not preserve evidence visibility, active review, traceability, memory or ongoing assurance checks around use.00

Operational AI should do more than produce outputs. It should help create the conditions for responsible use in real workflows.

That means thinking not only about immediate usefulness, but about what the system encourages over time. Does it keep people engaged with the right material? Does it make review meaningful? Does it preserve useful skill? Does it make drift visible before it becomes a serious problem?

Those are design questions, not just model questions.

Conclusion

Human factors research has been warning for years that useful machine support can still produce overreliance, complacency and skill fade. AI makes that warning more relevant, not less.

For teams in safety, security and resilience, the question is not simply whether AI can help. It is how to use it in ways that preserve judgement rather than quietly replacing parts of it.

That means paying attention not only to model quality, but to the human side of the workflow: how people check, how they decide, how their skills change, and how dependence may grow over time.

If organisations take that seriously, AI can become more useful and more sustainable. If they do not, the real risk may not be a dramatic technical failure. It may be the slow weakening of human judgement in the background of apparently successful use.

Learn more about

See how SIRV AI helps teams use AI with more control

SIRV AI is designed to support operational work with clearer limits around evidence, workflow, traceability, memory and review, so useful AI does not quietly become unchecked dependence.

Explore SIRV AI

Product

SIRV AI

Maps & Visualisations

Internal reports

Run a trial

Industries

Pharma & Life Sciences

Transport & Infrastructure

Venues

Financial services

Trust

Security & privacy

How SIRV measures impact

Company

"SIRV helped us move beyond basic reporting into a system that actively supports decision-making". Les O'Gorman, Director of Facilities, UCB - Pharma and Life Sciences

Privacy LinkedIn Medium Cookies Accessibility Terms of use