The Myth of the Root Cause: Why Your Accident Investigations Are Just Creative Writing for Lawyers

When an accident happens, we launch an investigation to find "The Root Cause." We look for the one broken domino that knocked down the rest. But in a complex world, there is never just one cause. There is a messy, tangled web of interactions. By chasing the mythical "Root Cause," we simplify reality until it is useless. It is time to stop hunting for a single culprit and start understanding the system.

Introduction: The CSI Fantasy

We all love a good detective story. Whether it’s Sherlock Holmes, Hercule Poirot, or CSI, we are culturally conditioned to love the moment when the brilliant investigator finds the single clue—the fingerprint, the dropped bullet casing, the hidden letter—that solves the puzzle. We bring this "Hollywood Fantasy" into industrial safety.

An accident happens. A forklift tips over. A chemical tank overflows. A scaffold collapses. We deploy the Investigation Team. Their mandate, written in the corporate standard, is clear and absolute: "Find the Root Cause."

They interview witnesses. They draw timelines. They use linear tools like the "5 Whys" or the "Fishbone Diagram." They filter the chaos. Eventually, they produce a sanitized report:

The Root Cause was: Failure to follow procedure SOP-101. The Root Cause was: Operator inattention due to fatigue. The Root Cause was: A faulty limit switch.

We feel satisfied. We found the "glitch in the matrix." We replace the switch, we discipline the operator, we rewrite the procedure. We close the file. Six months later, the exact same accident happens again.

Why? Because there is no such thing as a Root Cause. The idea that a catastrophic failure in a complex, adaptive system (like a refinery, a hospital, or a logistics network) can be traced back to a single origin point is a Newtonian delusion. It assumes the world is a simple clockwork machine. It is not. It is a biological ecosystem. And ecosystems don't fail because of one domino; they fail because of resonance, complexity, and tight coupling.

Part 1: The Trap of Linear Thinking (The Domino Fallacy)

In the 1930s, H.W. Heinrich gave safety science its most enduring (and damaging) metaphor: The Domino Theory.

Domino 1 (Social Environment) falls -> Hits Domino 2 (Fault of Person) -> Hits Domino 3 (Unsafe Act) -> Hits Domino 4 (Accident) -> Hits Domino 5 (Injury).

Heinrich told us: "If you remove the middle domino (Unsafe Act), you stop the accident." This logic works perfectly for Simple Systems.

If a chain snaps and the load falls, the root cause is the weak link in the chain. Replace the link, fix the problem.

But modern industry is a Complex System. In a complex system, accidents are not caused by components breaking. They are caused by components interacting.

The sensor was slightly off calibration (but within tolerance).
The operator was slightly tired (but within legal limits).
The procedure was slightly ambiguous (but usually worked).
The weather was slightly colder than usual.
The production pressure was slightly higher than average.

Individually, none of these are "causes." None of them are failures. They are normal variations. But together, they created a Resonance that led to disaster. If your investigation looks for THE cause, you will arbitrarily pick one of these components (usually the operator) and blame it. You will miss the Interaction. You will fix the part, but leave the dangerous interaction intact.

Part 2: The "5 Whys" is a Tool for Confirmation Bias

The "5 Whys" is the most popular investigation tool in the world. It is simple. It is intuitive. It is also one of the most dangerous tools in the hands of a biased investigator.

Why? Because you stop asking "Why" when you find an answer you like (or can afford). This is called the Stop Rule.

Let’s simulate a "5 Whys" session:

Accident: A worker slipped on oil.

Why did he slip? Because there was oil on the floor.
Why was there oil? Because the pump seal leaked.
Why did the seal leak? Because the gasket was old.
Why wasn't it replaced? Because the maintenance budget was cut this quarter.
Why was the budget cut? Because the CEO wanted a bigger bonus to boost the stock price.

STOP. No investigator in history puts "CEO Bonus" in the Root Cause box. Instead, they stop at Question 3 ("The seal failed") or Question 4 ("Maintenance Planner error").

The "5 Whys" is not a scientific tool; it is a political tool. It leads you down a single linear path selected by the investigator's bias, ignoring the 10 other branches of reality, and it stops exactly where the investigator feels safe. It simplifies a web into a line. And lines don't describe reality.

Part 3: Hindsight Bias (The "Crystal Ball" Effect)

The biggest enemy of a fair investigation is Hindsight Bias.

Before the accident, the outcome was uncertain. The operator saw green lights, heard normal noises, and felt normal vibrations.
After the accident, the outcome seems inevitable. The investigator sees the red flags, the warnings, the errors.

We look at the operator's actions and scream: "How could he be so stupid? It was obvious the tank would overflow!" It is obvious to you because you are reading the report. You know the ending of the movie. It was not obvious to him in the moment.

When we judge decisions based on the outcome rather than the information available at the time, we learn nothing. We just engage in "Counterfactual Reasoning":

"If only he had looked left..." "If only he had checked the gauge..." "If only he had followed the rule..."

Counterfactuals are fantasies. They describe a world that doesn't exist. You cannot fix a system by listing what people didn't do. You must understand why what they did do made sense to them at the time.

Part 4: The Principle of "Local Rationality"

This leads us to the most important concept in modern safety science (New View Safety), championed by Sidney Dekker: Local Rationality.

The Axiom: "People do not come to work to do a bad job. People do what makes sense to them at the time, given their goals, their focus, and their knowledge."

If an operator violated a procedure, don't ask "Why did they violate it?" (which implies malice or stupidity). Ask: "Why did it make sense to violate it?"

Maybe the procedure is outdated.
Maybe following the procedure takes 2 hours, and they only had 30 minutes.
Maybe the supervisor winked at the violation yesterday.

If you find "Human Error," you haven't found the cause. You have found the starting point of your investigation. Human Error is a symptom, not a cause.

Part 5: From "Who Failed?" to "What Failed?"

Traditional investigations act like prosecutors. They focus on Components (People, Parts). Modern investigations (Systems Thinking) focus on Constraints and Connections.

Here is the shift in questioning:

Old Way: "Who pressed the wrong button?" -> Result: Fire the operator.
New Way: "How did the system design make it easy to press the wrong button? Why are the 'Eject' and 'Start' buttons identical and placed next to each other?" -> Result: Redesign the panel.
Old Way: "Why didn't they see the alarm?" -> Result: Retrain on attention.
New Way: "How many alarms were ringing at the same time? Was there an alarm flood? Was the auditory environment too noisy?" -> Result: Fix the alarm logic.

Stop hunting for the "Broken Part." Start mapping the "Toxic Environment."

Part 6: The Solution – Learning Teams

How do we move beyond the "Root Cause" myth? We stop doing "Interrogations" and start doing "Learning Teams."

A Learning Team is a facilitated session with the people who actually do the work. It is not an investigation; it is a reconstruction of complexity.

The Learning Team Protocol:

Cool Down: Don't investigate immediately (when emotions are high and fear is rampant). Secure the scene, but wait 24 hours for the "Learning Team."
Gather the Experts: The experts are not the managers or the engineers. The experts are the welders, the drivers, the operators. The people who touch the tools.
Session 1 (Learning Mode - The "Blue Line"):
- Don't talk about the accident yet.
- Talk about normal work. "How is this job usually done? What makes it hard? What tools usually break? How do you normally work around this problem?"
- Understand the "Normal Mess."
Session 2 (The Event):
- Now discuss the accident. "How did the normal messiness drift into failure this time? What was different?"
- Look for the gap between Work-as-Imagined (Procedure) and Work-as-Done (Reality).
The Output:
- Not a single "Root Cause."
- But a list of systemic Weaknesses, Constraints, and Defenses that need to be improved.

The Bottom Line

The "Root Cause" is a fairy tale we tell ourselves to feel in control. It comforts management to believe that if we just fire the "Bad Apple" or replace the "Bad Part," we are safe. It creates an illusion of fixability.

But complexity fights back. If you want to stop the next accident, stop looking for the one thing that went wrong. Start looking at the thousand things that usually go right, and understand why they failed this time.

Stop simplifying. Embrace the mess. Stop judging. Start learning.

Search This Blog

The QHSE Standard