How Risk Assessment Fails—Again. This time at DuPont.

This entry is part 6 of 8 in the series Risk Assessment

A recent report released by the US Chemical Safety Board (CSB) looks at a series of accidents that occurred over a 33-hour period on January 22 and 23, 2010 at the DuPont Corporation’s Belle, West Virginia, chemical manufacturing plant.

A number of significant failures occurred, but I want to focus on one passage from the press release that is telling, particularly considering that DuPont is seen as a class leader when it comes to worker safety. I would encourage you to read the entire release. You can also have a look at the DuPont investigation details on the CSB site. CSB also produced a video discussing the investigation.

From the press release:

“Internal DuPont documents released with the CSB report indicate that in the 1980’s, company officials considered increasing the safety of the area of the plant where phosgene is handled by enclosing the area and venting the enclosure through  a scrubber system to destroy any toxic phosgene gas before it entered the atmosphere. The analysis concluded that an enclosure was the safest option for both workers and the public.  However, the documents indicate the company was concerned with containing costs and decided not to make the safety improvements. A DuPont employee  wrote in 1988,  “It may be that in the present circumstances the business can afford $2 million for an enclosure; however, in the long run can we afford to take such action which has such a small impact on safety and yet sets a precedent for all highly toxic material activities.[sic]”

The need for an enclosure was reiterated in a 2004 process hazard analysis conducted by DuPont, but four extensions were granted by DuPont management between 2004 and 2009, and at the time of the January 2010 release, no safety enclosure or scrubber system had been constructed. CSB investigators concluded that an enclosure, scrubber system, and routine requirement for protective breathing equipment before personnel entered the enclosure would have prevented any personnel exposures or injuries.”

The highlighted passage above shows one of the key failure modes in risk assessment: failure to act on the results. So what’s the point of conducting risk assessments if they are going to be ignored? In a presentation in 2010, a colleague of mine made this statement:

“The risk assessment process is intended to be used as a decision making tool that will help to protect workers.” — Tom Doyle, 2010

This is a fundamental truth. The risk assessment paperwork cannot protect a worker from a hazard, only action based on the report can do that.

When decision makers receive the results from a risk assessment process and choose to ignore it, or as the press release stated, “…extensions were granted by DuPont management…”, management is making a fundamentally flawed decision. The risk assessment process intentionally exposes the hazards in the scope of the analysis, and explicitly analyzes the probable severity of injury and occurrence. Once the analysis is complete, choosing to ignore the results, presuming that there is no evidence that the results are incorrect, amounts to negligence in my opinion.

Does this mean that we should not conduct risk assessments? Absolutely not! In the Western world, we are obligated to protect the safety of workers, including our colleagues and employees, as well as anyone else that may intentionally or unintentionally be exposed to the hazards created by our activities. We are morally and ethically, as well as legally, obligated.

Used correctly, risk assessment in any of its many forms provides a powerful tool to protect people. Like any other powerful tool, it also takes significant courage and skill to use correctly. Defaulting to the cost argument alone, as it appears that DuPont did in this case, results in the type of fatal failures seen in this tragic series of events.

Special thanks to my colleague Bryan Hayward, the Safety Engineering Network Group on LinkedIn, and SafTEng.net.

What is your experience with implementing risk assessment? Have you experienced this kind of result in your work? Share your experiences by commenting on this post!

Digiprove sealCopyright secured by Digiprove © 2011-2014
Acknowledgements: US Chemical Safety Board for excerpts more...
Some Rights Reserved

The purpose of risk assessment

This entry is part 4 of 8 in the series Risk Assessment

I’m often asked what seems like a pretty simple question: “Why do we need to do a risk assessment?” There are a lot of good reasons to do risk assessments, but ultimately, the purpose of risk assessment is best summed up in this quotation:

“Risk assessments, except in the simplest of circumstances, are not designed for making judgements, but to illuminate them.”

Richard Wilson and E. A. C. Crouch, Science, Volume 236, 1987, pp.267

How Risk Assessment Fails

This entry is part 2 of 8 in the series Risk Assessment

Fukushima Dai Ichi Power Plant after the explosionsThe events unfolding at Japan’s Fukushima Dai Ichi Nuclear Power plant are a case study in ways that the risk assessment process can fail or be abused. In an article published on Bloomberg.com, Jason Clenfield itemizes decades of fraud and failures in engineering and administration that have led to the catastrophic failure of four of six reactors at the 40-year-old Fukushima plant. Clenfield’s article, ‘Disaster Caps Faked Reports‘, goes on to cover similar failures in the Japanese nuclear sector.

Most people believe that the more serious the public danger, the more carefully the risks are considered in the design and execution of projects like the Fukushima plant. Clenfield’s article points to failures by a number of major international businesses involved in the design and manufacture of components for these reactors that may have contributed to the catastrophe playing out in Japan. In some cases, the correct actions could have bankrupted the companies involved, so rather than risk financial failure, these failures were covered up and the workers involved rewarded for their efforts. As you will see, sometimes the degree of care that we have a right to expect is not the level of care that is used.

How does this relate to the failure and abuse of the risk assessment process? Read on!

Risk Assessment Failures

Earthquake and Tsunami damage - Fukushima Dai Ichi Power PlantThe Fukushima Dai Ichi nuclear plant was constructed in the late 1960’s and early 1970’s, with Reactor #1 going on-line in 1971. The reactors at this facility use ‘active cooling’, requiring electrically powered cooling pumps to run continuously to keep the core temperatures in the normal operating range. As you will have seen in recent news reports, the plant is located on the shore, drawing water directly from the Pacific Ocean.

Learn more about Boiling Water Reactors used at Fukushima.

Read IEEE Spectrum’s “24-Hours at Fukushima“, a blow-by-blow account of the first 24 hours of the disaster.

Japan is located along one of the most active fault lines in the world, with plate subduction rates exceeding 90 mm/year. Earthquakes are so commonplace in this area that the Japanese people consider Japan to be the ‘land of earthquakes’, starting earthquake safety training in kindergarten.

Japan is the county that created the word ‘tsunami’ because the effects of sub-sea earthquakes often include large waves that swamp the shoreline. These waves affect all countries bordering the worlds oceans, but are especially prevalent where strong earthquakes are frequent.

In this environment it would be reasonable to expect that consideration of earthquake and tsunami effects would merit the highest consideration when assessing the risks related to these hazards. Remembering that risk is a function of severity of consequence and probability, the risk assessed from earthquake and tsunami should have been critical. Loss of cooling can result in the catastrophic overheating of the reactor core, potentially leading to a core meltdown.

The Fukushima Dai Ichi plant was designed to withstand 5.7 m tsunami waves, even though a 6.4 m wave had hit the shore close by 10 years before the plant went on-line. The wave generated by the recent earthquake was 7 m. Although the plant was not washed away by the tsunami, the wave created another problem.

Now consider that the reactors require constant forced cooling using electrically powered pumps. The backup generators installed to ensure that cooling pumps remain operational even if the mains power to the plant is lost, are installed in a basement subject to flooding. When the tsunami hit the seawall and spilled over the top, the floodwaters poured into the backup generator room, knocking out the diesel backup generators. The cooling system stopped. With no power to run the pumps, the reactor cores began to overheat. Although the reactors survived the earthquakes and the tsunami, without power to run the pumps the plant was in trouble.

Learn more about the accident.

Clearly there was a failure of reason when assessing the risks related the loss of cooling capability in these reactors. With systems that are mission critical in the way that these systems are, multiple levels of redundancy beyond a single backup system are often the minimum required.

In another plant in Japan, a section of piping carrying superheated steam from the reactor to the turbines ruptured injuring a number of workers. The pipe was installed when the plant was new and had never been inspected since installation because it was left off the safety inspection checklist. This is an example of a failure that resulted from blindly following a checklist without looking at the larger picture. There can be no doubt that someone at the plant noticed that other pipe sections were inspected regularly, but that this particular section was skipped, yet no changes in the process resulted.

Here again, the risk was not recognized even though it was clearly understood with respect to other sections of pipe in the same plant.

In another situation at a nuclear plant in Japan, drains inside the containment area of a reactor were not plugged at the end of the installation process. As a result, a small spill of radioactive water was released into the sea instead of being properly contained and cleaned up. The risk was well understood, but the control procedure for this risk was not implemented.

Finally, the Kashiwazaki Kariwa plant was constructed along a major fault line. The designers used figures for the maximum seismic acceleration that were three times lower than the accelerations that could be created by the fault. Regulators permitted the plant to be built even though the relative weakness of the design was known.

Failure Modes

I believe that there are a number of reasons why these kinds of failures occur.

People have a difficult time appreciating the meaning of probability. Probability is a key factor in determining the degree of risk from any hazard, yet when figures like ‘1 in 1000’ or ‘1 x 10-5 occurrences per year’ are discussed, it’s hard for people to truly grasp what these numbers mean. Likewise, when more subjective scales are used it can be difficult to really understand what ‘likely’ or ‘rarely’ actually mean.

Consequently, even in cases where the severity may be very high, the risk related to a particular hazard may be neglected because the risk is deemed to be low because the probability seems to be low.

When probability is discussed in terms of time, a figure like ‘1 x 10-5 occurrences per year’ can make the chance of an occurrence seem distant, and therefore less of a concern.

Most risk assessment approaches deal with hazards singly. This is done to simplify the assessment process, but the problem that can result from this approach is the effect that multiple failures can create, or that cascading failures can create. In a multiple failure condition, several protective measures fail simultaneously from a single cause (sometimes called Common Cause Failure). In this case, back-up measures may fail from the same cause, resulting in no protection from the hazard.

In a cascading failure, an initial failure is followed by a series of failures resulting in the partial or complete loss of the protective measures, resulting in partial or complete exposure to the hazard. Reasonably foreseeable combinations of failure modes in mission critical systems must be considered and the probability of each estimated.

Combination of hazards can result in synergy between the hazards resulting in a higher level of severity from the combination than is present from any one of the hazards taken singly. Reasonably foreseeable combinations of hazards and their potential synergies must be identified and the risk estimated.

Oversimplification of the hazard identification and analysis processes can result in overlooking hazards or underestimating the risk.

Thinking about the Fukushima Dai Ichi plant again, the combination of the effects of the earthquake on the plant, with the added impact of the tsunami wave, resulted in the loss of primary power to the plant followed by the loss of backup power from the backup generators, and the subsequent partial meltdowns and explosions at the plant. This combination of earthquake and tsunami was well known, not some ‘unimaginable’ or ‘unforeseeable’ situation. When conducting risk assessments, all reasonably foreseeable combinations of hazards must be considered.

Abuse and neglect

The risk assessment process is subject to abuse and neglect. Risk assessment has been used by some as a means to justify exposing workers and the public to risks that should not have been permitted. Skewing the results of the risk assessment, either by underestimating the risk initially, or by overestimating the effectiveness and reliability of control measures can lead to this situation. Decisions relating to the ‘tolerability’ or the ‘acceptability’ of risks when the severity of the potential consequences are high should be approached with great caution. In my opinion, unless you are personally willing to take the risk you are proposing to accept, it cannot be considered either tolerable or acceptable, regardless of the legal limits that may exist.

In the case of the Japanese nuclear plants, the operators have publicly admitted to falsifying inspection and repair records, some of which have resulted in accidents and fatalities.

In 1990, the US Nuclear Regulatory Commission wrote a report on the Fukushima Dai Ichi plant that predicted the exact scenario that resulted in the current crisis. These findings were shared with the Japanese authorities and the operators, but no one in a position of authority took the findings seriously enough to do anything. Relatively simple and low-cost protective measures, like increasing the height of the protective sea wall along the coastline and moving the backup generators to high ground could have prevented a national catastrophe and the complete loss of the plant.

A Useful Tool

Despite these human failings, I believe that risk assessment is an important tool. Increasingly sophisticated technology has rendered ‘common sense’ useless in many cases, because people do not have the expertise to have any common sense about the hazards related to these technologies.

Where hazards are well understood, they should be controlled with the simplest, most direct and effective measures available. In many cases this can be done by the people who first identify the hazard.

Where hazards are not well understood, bringing in experts with the knowledge to assess the risk and implement appropriate protective measures is the right approach.

The common aspect in all of this is the identification of hazards and the application of some sort of control measures. Risk assessment should not be neglected simply because it is sometimes difficult, or it can be done poorly, or the results neglected or ignored. We need to improve what we do with the results of these efforts, rather than neglect to do them at all.

In the mean time, the Japanese, and the world, have some cleanup to do.