- ISO Withdraws Machinery Risk Assessment Standards”>ISO Withdraws Machinery Risk Assessment Standards
- How Risk Assessment Fails
- The purpose of risk assessment
- The Problem with Probability
- TEPCO know about Fukushima before 11-Mar-11?”>What did TEPCO know about Fukushima before 11-Mar-11?
- How Risk Assessment Fails — Again. This time at DuPont.
- Scoring Severity of Injury – Hidden Probabilities
- The Probability Problem
- Understanding Risk Assessment
The events unfolding at Japan’s Fukushima Dai Ichi Nuclear Power plant are a case study in ways that the risk assessment process can fail or be abused. In an article published on Bloomberg.com, Jason Clenfield itemizes decades of fraud and failures in engineering and administration that have led to the catastrophic failure of four of six reactors at the 40-year-old Fukushima plant. Clenfield’s article, ‘Disaster Caps Faked Reports’, goes on to cover similar failures in the Japanese nuclear sector.
Most people believe that the more serious the public danger, the more carefully the risks are considered in the design and execution of projects like the Fukushima plant. Clenfield’s article points to failures by a number of major international businesses involved in the design and manufacture of components for these reactors that may have contributed to the catastrophe playing out in Japan. In some cases, the correct actions could have bankrupted the companies involved, so rather than risk financial failure, these failures were covered up and the workers involved rewarded for their efforts. As you will see, sometimes the degree of care that we have a right to expect is not the level of care that is used.
How does this relate to the failure and abuse of the risk assessment process? Read on!
Risk Assessment Failures
The Fukushima Dai Ichi nuclear plant was constructed in the late 1960’s and early 1970’s, with Reactor #1 going on-line in 1971. The reactors at this facility use ‘active cooling’, requiring electrically powered cooling pumps to run continuously to keep the core temperatures in the normal operating range. As you will have seen in recent news reports, the plant is located on the shore, drawing water directly from the Pacific Ocean.
Read IEEE Spectrum’s “24-Hours at Fukushima”, a blow-by-blow account of the first 24 hours of the disaster.
Japan is located along one of the most active fault lines in the world, with plate subduction rates exceeding 90 mm/year. Earthquakes are so commonplace in this area that the Japanese people consider Japan to be the ‘land of earthquakes’, starting earthquake safety training in kindergarten.
Japan is the county that created the word ‘tsunami’ because the effects of sub-sea earthquakes often include large waves that swamp the shoreline. These waves affect all countries bordering the worlds oceans, but are especially prevalent where strong earthquakes are frequent.
In this environment it would be reasonable to expect that consideration of earthquake and tsunami effects would merit the highest consideration when assessing the risks related to these hazards. Remembering that risk is a function of severity of consequence and probability, the risk assessed from earthquake and tsunami should have been critical. Loss of cooling can result in the catastrophic overheating of the reactor core, potentially leading to a core meltdown.
The Fukushima Dai Ichi plant was designed to withstand 5.7 m tsunami waves, even though a 6.4 m wave had hit the shore close by 10 years before the plant went on-line. The wave generated by the recent earthquake was 7 m. Although the plant was not washed away by the tsunami, the wave created another problem.
Now consider that the reactors require constant forced cooling using electrically powered pumps. The backup generators installed to ensure that cooling pumps remain operational even if the mains power to the plant is lost, are installed in a basement subject to flooding. When the tsunami hit the seawall and spilled over the top, the floodwaters poured into the backup generator room, knocking out the diesel backup generators. The cooling system stopped. With no power to run the pumps, the reactor cores began to overheat. Although the reactors survived the earthquakes and the tsunami, without power to run the pumps the plant was in trouble.
Clearly there was a failure of reason when assessing the risks related the loss of cooling capability in these reactors. With systems that are mission critical in the way that these systems are, multiple levels of redundancy beyond a single backup system are often the minimum required.
In another plant in Japan, a section of piping carrying superheated steam from the reactor to the turbines ruptured injuring a number of workers. The pipe was installed when the plant was new and had never been inspected since installation because it was left off the safety inspection checklist. This is an example of a failure that resulted from blindly following a checklist without looking at the larger picture. There can be no doubt that someone at the plant noticed that other pipe sections were inspected regularly, but that this particular section was skipped, yet no changes in the process resulted.
Here again, the risk was not recognized even though it was clearly understood with respect to other sections of pipe in the same plant.
In another situation at a nuclear plant in Japan, drains inside the containment area of a reactor were not plugged at the end of the installation process. As a result, a small spill of radioactive water was released into the sea instead of being properly contained and cleaned up. The risk was well understood, but the control procedure for this risk was not implemented.
Finally, the Kashiwazaki Kariwa plant was constructed along a major fault line. The designers used figures for the maximum seismic acceleration that were three times lower than the accelerations that could be created by the fault. Regulators permitted the plant to be built even though the relative weakness of the design was known.
I believe that there are a number of reasons why these kinds of failures occur.
People have a difficult time appreciating the meaning of probability. Probability is a key factor in determining the degree of risk from any hazard, yet when figures like ‘1 in 1000’ or ‘1 x 10-5 occurrences per year’ are discussed, it’s hard for people to truly grasp what these numbers mean. Likewise, when more subjective scales are used it can be difficult to really understand what ‘likely’ or ‘rarely’ actually mean.
Consequently, even in cases where the severity may be very high, the risk related to a particular hazard may be neglected because the risk is deemed to be low because the probability seems to be low.
When probability is discussed in terms of time, a figure like ‘1 x 10-5 occurrences per year’ can make the chance of an occurrence seem distant, and therefore less of a concern.
Most risk assessment approaches deal with hazards singly. This is done to simplify the assessment process, but the problem that can result from this approach is the effect that multiple failures can create, or that cascading failures can create. In a multiple failure condition, several protective measures fail simultaneously from a single cause (sometimes called Common Cause Failure). In this case, back-up measures may fail from the same cause, resulting in no protection from the hazard.
In a cascading failure, an initial failure is followed by a series of failures resulting in the partial or complete loss of the protective measures, resulting in partial or complete exposure to the hazard. Reasonably foreseeable combinations of failure modes in mission critical systems must be considered and the probability of each estimated.
Combination of hazards can result in synergy between the hazards resulting in a higher level of severity from the combination than is present from any one of the hazards taken singly. Reasonably foreseeable combinations of hazards and their potential synergies must be identified and the risk estimated.
Oversimplification of the hazard identification and analysis processes can result in overlooking hazards or underestimating the risk.
Thinking about the Fukushima Dai Ichi plant again, the combination of the effects of the earthquake on the plant, with the added impact of the tsunami wave, resulted in the loss of primary power to the plant followed by the loss of backup power from the backup generators, and the subsequent partial meltdowns and explosions at the plant. This combination of earthquake and tsunami was well known, not some ‘unimaginable’ or ‘unforeseeable’ situation. When conducting risk assessments, all reasonably foreseeable combinations of hazards must be considered.
Abuse and neglect
The risk assessment process is subject to abuse and neglect. Risk assessment has been used by some as a means to justify exposing workers and the public to risks that should not have been permitted. Skewing the results of the risk assessment, either by underestimating the risk initially, or by overestimating the effectiveness and reliability of control measures can lead to this situation. Decisions relating to the ‘tolerability’ or the ‘acceptability’ of risks when the severity of the potential consequences are high should be approached with great caution. In my opinion, unless you are personally willing to take the risk you are proposing to accept, it cannot be considered either tolerable or acceptable, regardless of the legal limits that may exist.
In the case of the Japanese nuclear plants, the operators have publicly admitted to falsifying inspection and repair records, some of which have resulted in accidents and fatalities.
In 1990, the US Nuclear Regulatory Commission wrote a report on the Fukushima Dai Ichi plant that predicted the exact scenario that resulted in the current crisis. These findings were shared with the Japanese authorities and the operators, but no one in a position of authority took the findings seriously enough to do anything. Relatively simple and low-cost protective measures, like increasing the height of the protective sea wall along the coastline and moving the backup generators to high ground could have prevented a national catastrophe and the complete loss of the plant.
A Useful Tool
Despite these human failings, I believe that risk assessment is an important tool. Increasingly sophisticated technology has rendered ‘common sense’ useless in many cases, because people do not have the expertise to have any common sense about the hazards related to these technologies.
Where hazards are well understood, they should be controlled with the simplest, most direct and effective measures available. In many cases this can be done by the people who first identify the hazard.
Where hazards are not well understood, bringing in experts with the knowledge to assess the risk and implement appropriate protective measures is the right approach.
The common aspect in all of this is the identification of hazards and the application of some sort of control measures. Risk assessment should not be neglected simply because it is sometimes difficult, or it can be done poorly, or the results neglected or ignored. We need to improve what we do with the results of these efforts, rather than neglect to do them at all.
In the mean time, the Japanese, and the world, have some cleanup to do.