How Risk Assessment Fails

This entry is part 2 of 8 in the series Risk Assessment

Fukushima Dai Ichi Power Plant after the explosionsThe events unfold­ing at Japan’s Fukushima Dai Ichi Nuclear Power plant are a case study in ways that the risk assess­ment pro­cess can fail or be abused. In an art­icle pub­lished on Bloomberg​.com, Jason Clenfield item­izes dec­ades of fraud and fail­ures in engin­eer­ing and admin­is­tra­tion that have led to the cata­stroph­ic fail­ure of four of six react­ors at the 40-​year-​old Fukushima plant. Clenfield’s art­icle, ‘Disaster Caps Faked Reports’, goes on to cov­er sim­il­ar fail­ures in the Japanese nuc­le­ar sector.

Most people believe that the more ser­i­ous the pub­lic danger, the more care­fully the risks are con­sidered in the design and exe­cu­tion of pro­jects like the Fukushima plant. Clenfield’s art­icle points to fail­ures by a num­ber of major inter­na­tion­al busi­nesses involved in the design and man­u­fac­ture of com­pon­ents for these react­ors that may have con­trib­uted to the cata­strophe play­ing out in Japan. In some cases, the cor­rect actions could have bank­rup­ted the com­pan­ies involved, so rather than risk fin­an­cial fail­ure, these fail­ures were covered up and the work­ers involved rewar­ded for their efforts. As you will see, some­times the degree of care that we have a right to expect is not the level of care that is used.

How does this relate to the fail­ure and abuse of the risk assess­ment pro­cess? Read on!

Risk Assessment Failures

Earthquake and Tsunami damage - Fukushima Dai Ichi Power PlantThe Fukushima Dai Ichi nuc­le­ar plant was con­struc­ted in the late 1960’s and early 1970’s, with Reactor #1 going on-​line in 1971. The react­ors at this facil­ity use ‘act­ive cool­ing’, requir­ing elec­tric­ally powered cool­ing pumps to run con­tinu­ously to keep the core tem­per­at­ures in the nor­mal oper­at­ing range. As you will have seen in recent news reports, the plant is loc­ated on the shore, draw­ing water dir­ectly from the Pacific Ocean.

Learn more about Boiling Water Reactors used at Fukushima.

Read IEEE Spectrum’s “24-​Hours at Fukushima”, a blow-​by-​blow account of the first 24 hours of the disaster.

Japan is loc­ated along one of the most act­ive fault lines in the world, with plate sub­duc­tion rates exceed­ing 90 mm/​year. Earthquakes are so com­mon­place in this area that the Japanese people con­sider Japan to be the ‘land of earth­quakes’, start­ing earth­quake safety train­ing in kindergarten.

Japan is the county that cre­ated the word ‘tsunami’ because the effects of sub-​sea earth­quakes often include large waves that swamp the shoreline. These waves affect all coun­tries bor­der­ing the worlds oceans, but are espe­cially pre­val­ent where strong earth­quakes are frequent.

In this envir­on­ment it would be reas­on­able to expect that con­sid­er­a­tion of earth­quake and tsunami effects would mer­it the highest con­sid­er­a­tion when assess­ing the risks related to these haz­ards. Remembering that risk is a func­tion of sever­ity of con­sequence and prob­ab­il­ity, the risk assessed from earth­quake and tsunami should have been crit­ic­al. Loss of cool­ing can res­ult in the cata­stroph­ic over­heat­ing of the react­or core, poten­tially lead­ing to a core meltdown.

The Fukushima Dai Ichi plant was designed to with­stand 5.7 m tsunami waves, even though a 6.4 m wave had hit the shore close by 10 years before the plant went on-​line. The wave gen­er­ated by the recent earth­quake was 7 m. Although the plant was not washed away by the tsunami, the wave cre­ated anoth­er problem.

Now con­sider that the react­ors require con­stant forced cool­ing using elec­tric­ally powered pumps. The backup gen­er­at­ors installed to ensure that cool­ing pumps remain oper­a­tion­al even if the mains power to the plant is lost, are installed in a base­ment sub­ject to flood­ing. When the tsunami hit the sea­wall and spilled over the top, the flood­wa­ters poured into the backup gen­er­at­or room, knock­ing out the dies­el backup gen­er­at­ors. The cool­ing sys­tem stopped. With no power to run the pumps, the react­or cores began to over­heat. Although the react­ors sur­vived the earth­quakes and the tsunami, without power to run the pumps the plant was in trouble.

Learn more about the accident.

Clearly there was a fail­ure of reas­on when assess­ing the risks related the loss of cool­ing cap­ab­il­ity in these react­ors. With sys­tems that are mis­sion crit­ic­al in the way that these sys­tems are, mul­tiple levels of redund­ancy bey­ond a single backup sys­tem are often the min­im­um required.

In anoth­er plant in Japan, a sec­tion of pip­ing car­ry­ing super­heated steam from the react­or to the tur­bines rup­tured injur­ing a num­ber of work­ers. The pipe was installed when the plant was new and had nev­er been inspec­ted since install­a­tion because it was left off the safety inspec­tion check­list. This is an example of a fail­ure that res­ul­ted from blindly fol­low­ing a check­list without look­ing at the lar­ger pic­ture. There can be no doubt that someone at the plant noticed that oth­er pipe sec­tions were inspec­ted reg­u­larly, but that this par­tic­u­lar sec­tion was skipped, yet no changes in the pro­cess resulted.

Here again, the risk was not recog­nized even though it was clearly under­stood with respect to oth­er sec­tions of pipe in the same plant.

In anoth­er situ­ation at a nuc­le­ar plant in Japan, drains inside the con­tain­ment area of a react­or were not plugged at the end of the install­a­tion pro­cess. As a res­ult, a small spill of radio­act­ive water was released into the sea instead of being prop­erly con­tained and cleaned up. The risk was well under­stood, but the con­trol pro­ced­ure for this risk was not implemented.

Finally, the Kashiwazaki Kariwa plant was con­struc­ted along a major fault line. The design­ers used fig­ures for the max­im­um seis­mic accel­er­a­tion that were three times lower than the accel­er­a­tions that could be cre­ated by the fault. Regulators per­mit­ted the plant to be built even though the rel­at­ive weak­ness of the design was known.

Failure Modes

I believe that there are a num­ber of reas­ons why these kinds of fail­ures occur.

People have a dif­fi­cult time appre­ci­at­ing the mean­ing of prob­ab­il­ity. Probability is a key factor in determ­in­ing the degree of risk from any haz­ard, yet when fig­ures like ‘1 in 1000’ or ‘1 x 10-5 occur­rences per year’ are dis­cussed, it’s hard for people to truly grasp what these num­bers mean. Likewise, when more sub­ject­ive scales are used it can be dif­fi­cult to really under­stand what ‘likely’ or ‘rarely’ actu­ally mean.

Consequently, even in cases where the sever­ity may be very high, the risk related to a par­tic­u­lar haz­ard may be neg­lected because the risk is deemed to be low because the prob­ab­il­ity seems to be low.

When prob­ab­il­ity is dis­cussed in terms of time, a fig­ure like ‘1 x 10-5 occur­rences per year’ can make the chance of an occur­rence seem dis­tant, and there­fore less of a concern.

Most risk assess­ment approaches deal with haz­ards singly. This is done to sim­pli­fy the assess­ment pro­cess, but the prob­lem that can res­ult from this approach is the effect that mul­tiple fail­ures can cre­ate, or that cas­cad­ing fail­ures can cre­ate. In a mul­tiple fail­ure con­di­tion, sev­er­al pro­tect­ive meas­ures fail sim­ul­tan­eously from a single cause (some­times called Common Cause Failure). In this case, back-​up meas­ures may fail from the same cause, res­ult­ing in no pro­tec­tion from the hazard.

In a cas­cad­ing fail­ure, an ini­tial fail­ure is fol­lowed by a series of fail­ures res­ult­ing in the par­tial or com­plete loss of the pro­tect­ive meas­ures, res­ult­ing in par­tial or com­plete expos­ure to the haz­ard. Reasonably fore­see­able com­bin­a­tions of fail­ure modes in mis­sion crit­ic­al sys­tems must be con­sidered and the prob­ab­il­ity of each estimated.

Combination of haz­ards can res­ult in syn­ergy between the haz­ards res­ult­ing in a high­er level of sever­ity from the com­bin­a­tion than is present from any one of the haz­ards taken singly. Reasonably fore­see­able com­bin­a­tions of haz­ards and their poten­tial syn­er­gies must be iden­ti­fied and the risk estimated.

Oversimplification of the haz­ard iden­ti­fic­a­tion and ana­lys­is pro­cesses can res­ult in over­look­ing haz­ards or under­es­tim­at­ing the risk.

Thinking about the Fukushima Dai Ichi plant again, the com­bin­a­tion of the effects of the earth­quake on the plant, with the added impact of the tsunami wave, res­ul­ted in the loss of primary power to the plant fol­lowed by the loss of backup power from the backup gen­er­at­ors, and the sub­sequent par­tial melt­downs and explo­sions at the plant. This com­bin­a­tion of earth­quake and tsunami was well known, not some ‘unima­gin­able’ or ‘unfore­see­able’ situ­ation. When con­duct­ing risk assess­ments, all reas­on­ably fore­see­able com­bin­a­tions of haz­ards must be considered.

Abuse and neglect

The risk assess­ment pro­cess is sub­ject to abuse and neg­lect. Risk assess­ment has been used by some as a means to jus­ti­fy expos­ing work­ers and the pub­lic to risks that should not have been per­mit­ted. Skewing the res­ults of the risk assess­ment, either by under­es­tim­at­ing the risk ini­tially, or by over­es­tim­at­ing the effect­ive­ness and reli­ab­il­ity of con­trol meas­ures can lead to this situ­ation. Decisions relat­ing to the ‘tol­er­ab­il­ity’ or the ‘accept­ab­il­ity’ of risks when the sever­ity of the poten­tial con­sequences are high should be approached with great cau­tion. In my opin­ion, unless you are per­son­ally will­ing to take the risk you are pro­pos­ing to accept, it can­not be con­sidered either tol­er­able or accept­able, regard­less of the leg­al lim­its that may exist.

In the case of the Japanese nuc­le­ar plants, the oper­at­ors have pub­licly admit­ted to falsi­fy­ing inspec­tion and repair records, some of which have res­ul­ted in acci­dents and fatalities.

In 1990, the US Nuclear Regulatory Commission wrote a report on the Fukushima Dai Ichi plant that pre­dicted the exact scen­ario that res­ul­ted in the cur­rent crisis. These find­ings were shared with the Japanese author­it­ies and the oper­at­ors, but no one in a pos­i­tion of author­ity took the find­ings ser­i­ously enough to do any­thing. Relatively simple and low-​cost pro­tect­ive meas­ures, like increas­ing the height of the pro­tect­ive sea wall along the coast­line and mov­ing the backup gen­er­at­ors to high ground could have pre­ven­ted a nation­al cata­strophe and the com­plete loss of the plant.

A Useful Tool

Despite these human fail­ings, I believe that risk assess­ment is an import­ant tool. Increasingly soph­ist­ic­ated tech­no­logy has rendered ‘com­mon sense’ use­less in many cases, because people do not have the expert­ise to have any com­mon sense about the haz­ards related to these technologies.

Where haz­ards are well under­stood, they should be con­trolled with the simplest, most dir­ect and effect­ive meas­ures avail­able. In many cases this can be done by the people who first identi­fy the hazard.

Where haz­ards are not well under­stood, bring­ing in experts with the know­ledge to assess the risk and imple­ment appro­pri­ate pro­tect­ive meas­ures is the right approach.

The com­mon aspect in all of this is the iden­ti­fic­a­tion of haz­ards and the applic­a­tion of some sort of con­trol meas­ures. Risk assess­ment should not be neg­lected simply because it is some­times dif­fi­cult, or it can be done poorly, or the res­ults neg­lected or ignored. We need to improve what we do with the res­ults of these efforts, rather than neg­lect to do them at all.

In the mean time, the Japanese, and the world, have some cleanup to do.