ISO 13849 – 1 Analysis — Part 8: Fault Exclusion

This entry is part of 9 in the series How to do a 13849 – 1 ana­lys­is

Fault Consideration & Fault Exclusion

ISO 13849 – 1, Chapter 7 [1, 7] dis­cusses the need for fault con­sid­er­a­tion and fault exclu­sion. Fault con­sid­er­a­tion is the pro­cess of examin­ing the com­pon­ents and sub-​systems used in the safety-​related part of the con­trol sys­tem (SRP/​CS) and mak­ing a list of all the faults that could occur in each one. This a def­in­itely non-​trivial exer­cise!

Thinking back to some of the earli­er art­icles in this series where I men­tioned the dif­fer­ent types of faults, you may recall that there are detect­able and undetect­able faults, and there are safe and dan­ger­ous faults, lead­ing us to four kinds of fault:

  • Safe undetect­able faults
  • Dangerous undetect­able faults
  • Safe detect­able faults
  • Dangerous undetect­able faults

For sys­tems where no dia­gnostics are used, Category B and 1, faults need to be elim­in­ated using inher­ently safe design tech­niques. Care needs to be taken when clas­si­fy­ing com­pon­ents as “well-​tried” versus using a fault exclu­sion, as com­pon­ents that might nor­mally be con­sidered “well-​tried” might not meet those require­ments in every applic­a­tion.

For sys­tems where dia­gnostics are part of the design, i.e., Category 2, 3, and 4, the fault lists are used to eval­u­ate the dia­gnost­ic cov­er­age (DC) of the test sys­tems. Depending on the archi­tec­ture, cer­tain levels of DC are required to meet the rel­ev­ant PL, see [1, Fig. 5]. The fault lists are start­ing point for the determ­in­a­tion of DC, and are an input into the hard­ware and soft­ware designs. All of the dan­ger­ous detect­able faults must be covered by the dia­gnostics, and the DC must be high enough to meet the PLr. for the safety func­tion.

The fault lists and fault exclu­sions are used in the Validation por­tion of this pro­cess as well. At the start of the Validation pro­cess flow chart [2, Fig. 1], you can see how the fault lists and the cri­ter­ia used for fault exclu­sion are used as inputs to the val­id­a­tion plan.

The diagram shows the first few stages in the ISO 13849-2 Validation process. See ISO 13849-2, Figure 1.
Start of ISO 13849 – 2 Fig. 1

Faults that can be excluded do not need to val­id­ated, sav­ing time and effort dur­ing the sys­tem veri­fic­a­tion and val­id­a­tion (V & V). How is this done?

Fault Consideration

The first step is to devel­op a list of poten­tial faults that could occur, based on the com­pon­ents and sub­sys­tems included in SRP/​CS. ISO 13849 – 2 [2] includes lists of typ­ic­al faults for vari­ous tech­no­lo­gies. For example, [2, Table A.4] is the fault list for mech­an­ic­al com­pon­ents.

Mechanical fault list from ISO 13849-2
Table A.4 — Faults and fault exclu­sions — Mechanical devices, com­pon­ents and ele­ments
(e.g. cam, fol­low­er, chain, clutch, brake, shaft, screw, pin, guide, bear­ing)

[2] con­tains tables sim­il­ar to Table A.4 for:

  • Pressure-​coil springs
  • Directional con­trol valves
  • Stop (shut-​off) valves/​non-​return (check) valves/​quick-​action vent­ing valves/​shuttle valves, etc.
  • Flow valves
  • Pressure valves
  • Pipework
  • Hose assem­blies
  • Connectors
  • Pressure trans­mit­ters and pres­sure medi­um trans­ducers
  • Compressed air treat­ment — Filters
  • Compressed-​air treat­ment — Oilers
  • Compressed air treat­ment — Silencers
  • Accumulators and pres­sure ves­sels
  • Sensors
  • Fluidic Information pro­cessing — Logical ele­ments
  • etc.

As you can see, there are many dif­fer­ent types of faults that need to be con­sidered. Keep in mind that I did not give you all of the dif­fer­ent fault lists – this post would be a mile long if I did that! The point is that you need to devel­op a fault list for your sys­tem, and then con­sider the impact of each fault on the oper­a­tion of the sys­tem. If you have com­pon­ents or sub­sys­tems that are not lis­ted in the tables, then you need to devel­op your own fault lists for those items. Using Failure Modes and Effects Analysis (FMEA) tech­niques are usu­ally the best approach for these com­pon­ents [23], [24].

When con­sid­er­ing the faults to be included in the list there are a few things that should be con­sidered [1, 7.2]:

  • if after the first fault occurs oth­er faults devel­op due to the first fault, then you can group those faults togeth­er as a single fault
  • two or more single faults with a com­mon cause can be con­sidered as a single fault
  • mul­tiple faults with dif­fer­ent causes but occur­ring sim­ul­tan­eously is con­sidered improb­able and does not need to be con­sidered

Examples

A voltage reg­u­lat­or fails in a sys­tem power sup­ply so that the 24 Vdc out­put rises to an unreg­u­lated 36 Vdc (the intern­al power sup­ply bus voltage), and after some time has passed, two sensors fail, then all three fail­ures can be grouped and con­sidered as a single fault.

If a light­ning strike occurs on the power line and the res­ult­ing surge voltage on the 400 V mains causes an inter­pos­ing con­tact­or and the motor drive it con­trols to fail to danger, then these fail­ures may be grouped and con­sidered as one.

A pneu­mat­ic lub­ric­at­or runs out of lub­ric­ant and is not refilled, depriving down­stream pneu­mat­ic com­pon­ents of lub­ric­a­tion. The spool on the sys­tem dump valve sticks open because it is not cycled often enough. Neither of these fail­ures has the same cause, so there is no need to con­sider them as occur­ring sim­ul­tan­eously because the prob­ab­il­ity of both hap­pen­ing con­cur­rently is extremely small. One cau­tion: These two faults MAY have a com­mon cause – poor main­ten­ance. Even if this is true and you decide to con­sider them to be two faults with a com­mon cause, they could then be grouped as a single fault.

Fault Exclusion

Once you have your well-​considered fault lists togeth­er, the next ques­tion is “Can any of the lis­ted faults be excluded?” This is a tricky ques­tion! There are a few points to con­sider:

  • Does the sys­tem archi­tec­ture allow for fault exclu­sion?
  • Is the fault tech­nic­ally improb­able, even if it is pos­sible?
  • Does exper­i­ence show that the fault is unlikely to occur?*
  • Are there tech­nic­al require­ments related to the applic­a­tion and the haz­ard that might sup­port fault exclu­sion?

BE CAREFUL with this one!

Whenever faults are excluded, a detailed jus­ti­fic­a­tion for the exclu­sion needs to be included in the sys­tem design doc­u­ment­a­tion. Simply decid­ing that the fault can be excluded is NOT ENOUGH! Consider the risk a per­son will be exposed to in the event the fault occurs. If the sever­ity is very high, i.e., severe per­man­ent injury or death, you may not want to exclude the fault even if you think you could. Careful con­sid­er­a­tion of the res­ult­ing injury scen­ario is needed.

Basing a fault exclu­sion on per­son­al exper­i­ence is sel­dom con­sidered adequate, which is why I added the aster­isk (*) above. Look for good stat­ist­ic­al data to sup­port any decision to use a fault exclu­sion.

There is much more inform­a­tion avail­able in IEC 61508 – 2 on the sub­ject of fault exclu­sion, and there is good inform­a­tion in some of the books men­tioned below [0.2], [0.3], and [0.4]. If you know of addi­tion­al resources you would like to share, please post the inform­a­tion in the com­ments!

Definitions

3.1.3 fault
state of an item char­ac­ter­ized by the inab­il­ity to per­form a required func­tion, exclud­ing the inab­il­ity dur­ing pre­vent­ive main­ten­ance or oth­er planned actions, or due to lack of extern­al resources
Note 1 to entry: A fault is often the res­ult of a fail­ure of the item itself, but may exist without pri­or fail­ure.
Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault. [SOURCE: IEC 60050?191:1990, 05 – 01.]

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety crit­ic­al sys­tems hand­book. Amsterdam: Elsevier/​Butterworth-​Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of tech­niques and meas­ures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of tech­niques and meas­ures related to EMC for Functional Safety, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[1]     Safety of machinery — Safety-​related parts of con­trol sys­tems — Part 1: General prin­ciples for design. 3rd Edition. ISO Standard 13849 – 1. 2015.

[2]     Safety of machinery – Safety-​related parts of con­trol sys­tems – Part 2: Validation. 2nd Edition. ISO Standard 13849 – 2. 2012.

[3]      Safety of machinery – General prin­ciples for design – Risk assess­ment and risk reduc­tion. ISO Standard 12100. 2010.

[4]     Safeguarding of Machinery. 2nd Edition. CSA Standard Z432. 2004.

[5]     Risk Assessment and Risk Reduction- A Guideline to Estimate, Evaluate and Reduce Risks Associated with Machine Tools. ANSI Technical Report B11.TR3. 2000.

[6]    Safety of machinery – Emergency stop func­tion – Principles for design. ISO Standard 13850. 2015.

[7]     Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems. 7 parts. IEC Standard 61508. Edition 2. 2010.

[8]     S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncer­tain­ties in the val­id­a­tion of an exist­ing safety-​related con­trol cir­cuit with the ISO 13849 – 1:2006 design stand­ard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104 – 112, Jan. 2014.

[9]    Guidance on the applic­a­tion of ISO 13849 – 1 and IEC 62061 in the design of safety-​related con­trol sys­tems for machinery. IEC Technical Report TR 62061 – 1. 2010.

[10]     Safety of machinery – Functional safety of safety-​related elec­tric­al, elec­tron­ic and pro­gram­mable elec­tron­ic con­trol sys­tems. IEC Standard 62061. 2005.

[11]    Guidance on the applic­a­tion of ISO 13849 – 1 and IEC 62061 in the design of safety-​related con­trol sys­tems for machinery. IEC Technical Report 62061 – 1. 2010.

[12]    D. S. G. Nix, Y. Chinniah, F. Dosio, M. Fessler, F. Eng, and F. Schrever, “Linking Risk and Reliability — Mapping the out­put of risk assess­ment tools to func­tion­al safety require­ments for safety related con­trol sys­tems,” 2015.

[13]    Safety of machinery. Safety related parts of con­trol sys­tems. General prin­ciples for design. CEN Standard EN 954 – 1. 1996.

[14]   Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems – Part 2: Requirements for electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems. IEC Standard 61508 – 2. 2010.

[15]     Reliability Prediction of Electronic Equipment. Military Handbook MIL-​HDBK-​217F. 1991.

[16]     “IFA – Practical aids: Software-​Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv​.de, 2017. [Online]. Available: http://​www​.dguv​.de/​i​f​a​/​p​r​a​x​i​s​h​i​l​f​e​n​/​p​r​a​c​t​i​c​a​l​-​s​o​l​u​t​i​o​n​s​-​m​a​c​h​i​n​e​-​s​a​f​e​t​y​/​s​o​f​t​w​a​r​e​-​s​i​s​t​e​m​a​/​i​n​d​e​x​.​jsp. [Accessed: 30- Jan- 2017].

[17]      “fail­ure mode”, 192−03−17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331 – 338, 2006.

[19]     Out of Control — Why con­trol sys­tems go wrong and how to pre­vent fail­ure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

[20]     Safeguarding of Machinery. 3rd Edition. CSA Standard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Canada, 1990.

[22]     “Field-​programmable gate array”, En​.wiki​pe​dia​.org, 2017. [Online]. Available: https://​en​.wiki​pe​dia​.org/​w​i​k​i​/​F​i​e​l​d​-​p​r​o​g​r​a​m​m​a​b​l​e​_​g​a​t​e​_​a​r​ray. [Accessed: 16-​Jun-​2017].

[23]     Analysis tech­niques for sys­tem reli­ab­il­ity – Procedure for fail­ure mode and effects ana­lys­is (FMEA). 2nd Ed. IEC Standard 60812. 2006.

[24]     “Failure mode and effects ana­lys­is”, En​.wiki​pe​dia​.org, 2017. [Online]. Available: https://​en​.wiki​pe​dia​.org/​w​i​k​i​/​F​a​i​l​u​r​e​_​m​o​d​e​_​a​n​d​_​e​f​f​e​c​t​s​_​a​n​a​l​y​sis. [Accessed: 16-​Jun-​2017].

Hockey Teams and Risk Reduction or What Makes Roberto Luongo = PPE

This entry is part 1 of 3 in the series Hierarchy of Controls

Special Co-​Author, Tom Doyle

Last week we saw the Boston Bruins earn the Stanley Cup. I was root­ing for the green, blue and white, and the ruin of my voice on Thursday was ample evid­ence that no amount of cheer­ing helped. While I was watch­ing the game with friends and col­leagues, I real­ized that Roberto Luongo and Tim Thomas were their respect­ive team’s PPE*. Sound odd? Let me explain.

Risk Assessment and the Hierarchy of Controls

Equipment design­ers need to under­stand  OHS* risk. The only proven meth­od for under­stand­ing risk is risk assess­ment. Once that is done, the next play in the game is the reduc­tion of risks by elim­in­at­ing haz­ards wherever pos­sible and con­trolling those that remain.

Control comes in a couple of fla­vours:

  • Hazard modi­fic­a­tion to reduce the sever­ity of injury, or 
  • prob­ab­il­ity modi­fic­a­tion to reduce the prob­ab­il­ity of a work­er com­ing togeth­er with the haz­ard.

These ideas have been form­al­ized in the Hierarchy of Controls. Briefly, the Hierarchy starts with haz­ard elim­in­a­tion or sub­sti­tu­tion, and flows down through engin­eer­ing con­trols, inform­a­tion for use, admin­is­trat­ive con­trols and finally PPE. As you move down through the Hierarchy, the effect­ive­ness and the reli­ab­il­ity of the meas­ures declines.

It’s import­ant to recog­nize that we haven’t done a risk assess­ment in writ­ing this post. This step was skipped for the pur­pose of this example — to apply the hier­archy cor­rectly, you MUST start with a risk assess­ment!

So how does this relate to Hockey?

Hockey and the Hierarchy of Controls

Hazard Identification and Exposure to Risk

If we con­sider the goal as the work­er – the thing we don’t want “injured”, the puck is the haz­ard, and the act of scor­ing a goal as the act of injur­ing a per­son, then the rest quickly becomes clear.

Level 1: Hazard Elimination

By defin­i­tion, if we elim­in­ate the puck, we no longer have a game. We just have a bunch of big guys skat­ing around in cool jer­seys with sticks, maybe hav­ing a fight or two, because they’re bored or just don’t know what else to do. Since we want to have a game, either to play or to watch, we have to allow the risk of injury to exist. We could call this the “intrins­ic risk”, as it is the risk that exists before we add any con­trols.

Level 2: Hazard Substitution

The Center and the Wingers (col­lect­ively the “Forwards” or the “Offensive Line”), act as haz­ard “sub­sti­tu­tion”. We’ve already estab­lished that elim­in­a­tion of the haz­ard res­ults in the loss of the inten­ded func­tion — no puck, no game. The for­wards only let the oth­er team have the puck on rare occa­sion, if they’re play­ing well. This is a great idea, but still a little too optim­ist­ic after all. Both teams are try­ing to get the puck in the oppos­ing net and both teams have qual­i­fied to play the final game. If they fail to keep the puck bey­ond the oth­er team’s blue line, or at least bey­ond the cen­ter line, then the next lay­er of pro­tec­tion kicks in, with the Defensive Line.

Level 3: Engineering Controls

As the puck moves down the ice, the Defensive Line engages the approach­ing puck, attempt­ing to block access to the area closer to the goal. They act as a mov­able bar­ri­er between the net and the puck.  They will do whatever is neces­sary to keep the haz­ard from com­ing in con­tact with the net. As engin­eer­ing con­trols, their coördin­a­tion and pos­i­tion­ing are crit­ic­al in ensur­ing suc­cess.

The sys­tem will fail if the con­trols have poor:

  • pos­i­tion­ing,
  • choice of mater­i­als (play­ers),
  • tim­ing, etc.

These risk con­trols fail reg­u­larly, so are less desir­able than hav­ing the Forward Line handle Risk Control.

Level 4: Information for Use and Awareness Means

In a hockey game, the inform­a­tion for use is the rule book. This inform­a­tion tells play­ers, coaches, and offi­cials how the game is to be played, and what the inten­ded use of the game should be. Activities like spear­ing, trip­ping, and blind-​side checks are not per­mit­ted.

The aware­ness means are provided by the roar of the fans. As the puck heads for the home-team’s goal, the home fans will roar, let­ting the team know, if they don’t know already, that the goal is at risk from the puck. Hopefully the defens­ive line can react in time and get between the puck and the net.

Level 5: Administrative Controls

Information for use from the pre­vi­ous step is the basis for all the fol­low­ing con­trols. The team’s coaches, or “super­visors”, use this inform­a­tion to give train­ing in the form of hockey prac­tice. The Forward Line and Defensive Line could be con­sidered the Suppliers and Users. They all need to know what to do to avoid haz­ard­ous situ­ations, and what to do when one arises, to reduce the num­ber of poten­tial fail­ures.

A “Permit to Work” is giv­en to the play­ers by the coach when they form the lines. The coach ensures that the right people are on the ice for each set of cir­cum­stances, decid­ing when line changes hap­pen as the game pro­gresses, adapt­ing the people per­mit­ted to work to the spe­cif­ic con­di­tions on the ice.

Level 6: Personal Protective Equipment (PPE)

All of this brings me to Roberto Luongo and Tim Thomas. So how is a Goalie like PPE?

Goalies are the “last-​ditch” pro­tec­tion. It’s clear that the first 5 levels of the hier­archy don’t always work, since every type of con­trol, even haz­ard elim­in­a­tion, has fail­ure modes. To give a bit of backup, we should make sure that we add extra pro­tec­tion in the form of PPE.

The puck wasn’t elim­in­ated, since hav­ing a hockey game is the point, after all. The puck wasn’t kept dis­tant by the Forward Line. The Defensive Line failed to main­tain safe dis­tance between the goal and the puck, and now all that is left is the goalie (or your pro­tect­ive eye­wear, boots, hard­hat, or whatever). In the 2011 Stanley Cup Final game, Luongo equaled long pants and long sleeves, while Thomas equaled a suit of armour. The Bruin’s “PPE” afforded super­i­or pro­tec­tion in this case.

As any­one who has used pro­tect­ive eye­wear knows, particles can get by your eye­wear. There are lots of factors, includ­ing how well they fit, if you’re wear­ing them (prop­erly or at all!), etc. If the gear is fit­ted and used prop­erly by a per­son who under­stands WHY and HOW to use the equip­ment, then the PPE is more like Tim Thomas, and you may be able to “shut out” injury. Most of the time. Remember that even Tim Thomas misses stop­ping some shots on goal and the oth­er guys can still score.

When your PPE doesn’t fit prop­erly, isn’t selec­ted prop­erly, is worn out (or psyched out as the case may be), or isn’t used prop­erly, then it’s more like Roberto Luongo. Sometimes it works per­fectly, and life is good. Sometimes it fails com­pletely and you end up injured or worse.

Goalies are also like PPE because they are RIGHT THERE. Right before injury will occur. PPE is RIGHT THERE, pro­tect­ing you — 5 mm from the sur­face of your eye, or in your ear, 2 mm from your ear drum. By this point the harm­ful energy is RIGHT THERE, ready to hurt you, and injury is immin­ent. A simple mis­place­ment or bad fit con­di­tion and you’re blinded or deaf or… well you get the idea!

On Wednesday night, 15-​Jun-​2011, everything failed for the Vancouver Canucks. The team’s spir­it was down, and they went into the game think­ing “We just don’t want to lose!” instead of Boston’s “We’re tak­ing that Cup home!”. Even the touted Home Ice Advantage wasn’t enough to psych out the Bruins, and in the end I think it turned on the Canucks as the fans real­ized that the game was lost. The warn­ings failed, the guards failed, and the PPE failed. Somebody got hurt, and unfor­tu­nately for Canadian fans, it was the Canucks. Luckily it wasn’t a fatal­ity! Even being #2 in the NHL is a long stretch bet­ter than filling a cool­er draw­er in the morgue.

So the next time you’re set­ting up a job, an assembly line, a new machine, or a new work­place, check out your team and make sure that you’ve got the right play­ers on the ice. You only get one chance to get it right. Sure, you can change the lines and upgrade when you need to, but once someone scores a goal, you have an injured per­son and big­ger prob­lems to deal with.

Special thanks to Tom Doyle for his con­tri­bu­tions to this post!

*Personal Protective EquipmentOccupational Health and Safety

Understanding the Hierarchy of Controls

This entry is part 2 of 3 in the series Hierarchy of Controls

Risk assess­ment is the first step in redu­cing the risk that your cus­tom­ers and users are exposed to when they use your products. The second step is Risk Reduction, some­times called Risk Control or Risk Mitigation. This art­icle looks at the ways that risk can be con­trolled using the Hierarchy of Controls. Figure 2 from ISO 12100 – 1 (shown below) illus­trates this point.

The sys­tem is called a hier­archy because you must apply each level in the order that they fall in the list. In terms of effect­ive­ness at redu­cing risk, the first level in the hier­archy, elim­in­a­tion, is the most effect­ive, down to the last, PPE*, which has the least effect­ive­ness.

It’s import­ant to under­stand that ques­tions must be asked after each step in the hier­archy is imple­men­ted, and that is “Is the risk reduced as much as pos­sible? Is the resid­ual risk a) in com­pli­ance with leg­al require­ments, and b) accept­able to the user or work­er?”. When you can answer ‘YES’ to all of these ques­tions, the last step is to ensure that you have warned the user of the resid­ual risks, have iden­ti­fied the required train­ing needed and finally have made recom­mend­a­tions for any needed PPE.

*PPE – Personal Protective Equipment. e.g. Protective eye wear, safety boots, bump caps, hard hats, cloth­ing, gloves, res­pir­at­ors, etc. CSA Z1002 includes ‘…any­thing designed to be worn, held, or car­ried by an indi­vidu­al for pro­tec­tion against one or more haz­ards.’  in this defin­i­tion.

Risk Reduction from the Designer's Viewpoint
ISO 12100:2010 – Figure 2

 

Introducing the Hierarchy of Controls

The Hierarchy of Controls was developed in a num­ber of dif­fer­ent stand­ards over the last 20 years or so. The idea was to provide a com­mon struc­ture that would provide guid­ance to design­ers when con­trolling risk.

Typically, the first three levels of the hier­archy may be con­sidered to be ‘engin­eer­ing con­trols’ because they are part of the design pro­cess for a product. This does not mean that they must be done by engin­eers!

We’ll look at each level in the hier­archy in detail. First, let’s take a look at what is included in the Hierarchy.

The Hierarchy of Controls includes:

1)    Hazard Elimination or Substitution (Design)
2)    Engineering Controls (see [1, 2, 8, 9, 10, and 11])

a)    Barriers

b)    Guards (Fixed, Movable w/​interlocks)

c)    Safeguarding Devices

d)    Complementary Protective Measures

3)    Information for Use (see [1, 2, 4, 7, 8, 12, and 13])

a)    Hazard Warnings

b)    Manuals

c)    HMI* & Awareness Devices (lights, horns)

4)    Administrative Controls (see [1, 2, 4, 5, 7, and 8])

a)    Training

b)    SOP’s,

c)    Hazardous Energy Control Procedures (see [5, 14])

d)    Authorization

5)    Personal Protective Equipment

a)    Specification

b)    Fitting

c)    Training in use

d)    Maintenance

*HMI – Human-​Machine Interface. Also called the ‘con­sole’ or ‘oper­at­or sta­tion’. The loc­a­tion on the machine where the oper­at­or con­trols are loc­ated. Often includes a pro­gram­mable screen or oper­at­or dis­play, but can be a simple array of but­tons, switches and indic­at­or lights.

The man­u­fac­turer, developer or integ­rat­or of the sys­tem should provide the first three levels of the hier­archy. Where they have not been provided, the work­place or user should provide them.

The last two levels must be provided by the work­place or user.

Effectiveness

Each lay­er in the hier­archy has a level of effect­ive­ness that is related to the fail­ure modes asso­ci­ated with the con­trol meas­ures and the rel­at­ive effect­ive­ness in redu­cing risk in that lay­er. As you go down the hier­archy, the reli­ab­il­ity and effect­ive­ness decrease as shown below.

Effectiveness of the Hierarchy of ControlsThere is no way to meas­ure or spe­cific­ally quanti­fy the reli­ab­il­ity or effect­ive­ness of each lay­er of the hier­archy – that must wait until you make some selec­tions from each level, and even then it can be very hard to do. The import­ant thing to under­stand is that Elimination is more effect­ive than Guarding (engin­eer­ing con­trols), which is more effect­ive than Awareness Means, etc.

1. Hazard Elimination or Substitution

Hazard Elimination

Hazard elim­in­a­tion is the most effect­ive means of redu­cing risk from a par­tic­u­lar haz­ard, for the simple reas­on that once the haz­ard has been elim­in­ated there is no remain­ing risk. Remember that risk is a func­tion of sever­ity and prob­ab­il­ity. Since both sever­ity and prob­ab­il­ity are affected by the exist­ence of the haz­ard, elim­in­at­ing the haz­ard reduces the risk from that par­tic­u­lar haz­ard to zero. Some prac­ti­tion­ers con­sider this to mean the elim­in­a­tion is 100% effect­ive, how­ever it’s my opin­ion that this is not the case because even elim­in­a­tion has fail­ure modes that can re-​introduce the haz­ard.

Failure Modes:

Hazard elim­in­a­tion can fail if the haz­ard is rein­tro­duced into the design. With machinery this isn’t that likely to occur, but in pro­cesses, ser­vices and work­places it can occur.

Substitution

Substitution requires the design­er to sub­sti­tute a less haz­ard­ous mater­i­al or pro­cess for the ori­gin­al mater­i­al or pro­cess. For example, beryl­li­um is a highly tox­ic met­al that is used in some high tech applic­a­tions. Inhalation or skin con­tact with beryl­li­um dust can do ser­i­ous harm to a per­son very quickly, caus­ing acute beryl­li­um dis­ease. Long term expos­ure can cause chron­ic beryl­li­um dis­ease. Substituting a less tox­ic mater­i­al with sim­il­ar prop­er­ties in place of the beryl­li­um in the pro­cess  could reduce or elim­in­ate the pos­sib­il­ity of beryl­li­um dis­ease, depend­ing on the exact con­tent of the sub­sti­tute mater­i­al. If the sub­sti­tute mater­i­al includes any amount of beryl­li­um, then the risk is only reduced. If it con­tains no beryl­li­um, the risk is elim­in­ated. Note that the risk can also be reduced by ensur­ing that the beryl­li­um dust is not cre­ated by the pro­cess, since beryl­li­um is not tox­ic unless inges­ted.

Alternatively, using pro­cesses to handle the beryl­li­um without cre­at­ing dust or particles could reduce the expos­ure to the mater­i­al in forms that are likely to cause beryl­li­um dis­ease. An example of this could be sub­sti­tu­tion of water-​jet cut­ting instead of mech­an­ic­al saw­ing of the mater­i­al.

Failure Modes:

Reintroduction of the sub­sti­tuted mater­i­al into a pro­cess is the primary fail­ure mode, how­ever there may be oth­ers that are spe­cif­ic to the haz­ard and the cir­cum­stances. In the above example, pre- and post-​cutting hand­ling of the mater­i­al could still cre­ate dust or small particles, res­ult­ing in expos­ure to beryl­li­um. A sub­sti­tuted mater­i­al might intro­duce oth­er, new haz­ards, or might cre­ate fail­ure modes in the final product that would res­ult in risks to the end user. Careful con­sid­er­a­tion is required!

If neither elim­in­a­tion or sub­sti­tu­tion is pos­sible, we move to the next level in the hier­archy.

2. Engineering Controls

Engineering con­trols typ­ic­ally include vari­ous types of mech­an­ic­al guards [16, 17, & 18], inter­lock­ing sys­tems [9, 10, 11, & 15], and safe­guard­ing devices like light cur­tains or fences, area scan­ners, safety mats and two-​hand con­trols [19]. These sys­tems are pro­act­ive in nature, act­ing auto­mat­ic­ally to pre­vent access to a haz­ard and there­fore pre­vent­ing injury. These sys­tems are designed to act before a per­son can reach the danger zone and be exposed to the haz­ard.

Control reliability

Barrier guards and fixed guards are not eval­u­ated for reli­ab­il­ity because they do not rely on a con­trol sys­tem for their effect­ive­ness. As long as they are placed cor­rectly in the first place, and are oth­er­wise prop­erly designed to con­tain the haz­ards they are pro­tect­ing, then noth­ing more is required. On the oth­er hand, safe­guard­ing devices, like inter­locked guards, light fences, light cur­tains, area scan­ners, safety mats, two-​hand con­trols and safety edges, all rely on a con­trol sys­tem for their effect­ive­ness. Correct applic­a­tion of these devices requires cor­rect place­ment based on the stop­ping per­form­ance of the haz­ard and cor­rect integ­ra­tion of the safety device into the safety related parts of the con­trol sys­tem [19]. The degree of reli­ab­il­ity is based on the amount of risk reduc­tion that is being required of the safe­guard­ing device and the degree of risk present in the unguarded state [9, 10].

There are many detailed tech­nic­al require­ments for engin­eer­ing con­trols that I can’t get into in this art­icle, but you can learn more by check­ing out the ref­er­ences at the end of this art­icle and oth­er art­icles on this blog.

Failure Modes

Failure modes for engin­eer­ing con­trols are as many and as var­ied as the devices used and the meth­ods of integ­ra­tion chosen. This dis­cus­sion will have to wait for anoth­er art­icle!

Awareness Devices

Of spe­cial note are ‘aware­ness devices’. This group includes warn­ing lights, horns, buzzers, bells, etc. These devices have some aspects that are sim­il­ar to engin­eer­ing con­trols, in that they are usu­ally part of the machine con­trol sys­tem, but they are also some­times classed as ‘inform­a­tion for use’, par­tic­u­larly when you con­sider indic­at­or or warn­ing lights and HMI screens. In addi­tion to these ‘act­ive’ types of devices, aware­ness devices may also include lines painted or taped on the floor or on the edge of a step or elev­a­tion change, warn­ing chains, sig­nage, etc. Signage may also be included in the class of ‘inform­a­tion for use’, along with HMI screens.

Failure Modes

Failure modes for Awareness Devices include:

  • Ignoring the warn­ings (Complacency or Failure to com­pre­hend the mean­ing of the warn­ing);
  • Failure to main­tain the device (warn­ing lights burned out or removed);
  • Defeat of the device (silen­cing an aud­ible warn­ing device);
  • Inappropriate selec­tion of the device (invis­ible or inaud­ible in the pre­dom­in­at­ing con­di­tions).

Complementary Protective Measures

Complementary Protective meas­ures are a class of con­trols that are sep­ar­ate from the vari­ous types of safe­guard­ing because they gen­er­ally can­not pre­vent injury, but may reduce the sever­ity of injury or the prob­ab­il­ity of the injury occur­ring. Complementary pro­tect­ive meas­ures are react­ive in nature, mean­ing that they are not auto­mat­ic. They must be manu­ally activ­ated by a user before any­thing will occur, e.g. press­ing an emer­gency stop but­ton. They can only com­ple­ment the pro­tec­tion provided by the auto­mat­ic sys­tems.

A good example of this is the Emergency Stop sys­tem that is designed into many machines. On its own, the emer­gency stop sys­tem will do noth­ing to pre­vent an injury. The sys­tem must be activ­ated manu­ally by press­ing a but­ton or pulling a cable. This relies on someone detect­ing a prob­lem and real­iz­ing that the machine needs to be stopped to avoid or reduce the sever­ity of an injury that is about to occur or is occur­ring. Emergency stop can only ever be a back-​up meas­ure to the auto­mat­ic inter­locks and safe­guard­ing devices used on the machine. In many cases, the next step in emer­gency response after press­ing the emer­gency stop is to call 911.

Failure Modes:

The fail­ure modes for these kinds of con­trols are too numer­ous to list here, how­ever they range from simple fail­ure to replace a fixed guard or bar­ri­er fence, to fail­ure of elec­tric­al, pneu­mat­ic or hydraul­ic con­trols. These fail­ure modes are enough of a con­cern that a new field of safety engin­eer­ing called ‘Functional Safety Engineering’ has grown up around the need to be able to ana­lyze the prob­ab­il­ity of fail­ure of these sys­tems and to use addi­tion­al design ele­ments to reduce the prob­ab­il­ity of fail­ure to a level we can tol­er­ate. For more on this, see [9, 10, 11].

Once you have exhausted all the pos­sib­il­it­ies in Engineering Controls, you can move to the next level down in the hier­archy.

3. Information for Use

This is a very broad top­ic, includ­ing manu­als, instruc­tion sheets, inform­a­tion labels on the product, haz­ard warn­ing signs and labels, HMI screens, indic­at­or and warn­ing lights, train­ing mater­i­als, video, pho­to­graphs, draw­ings, bills of mater­i­als, etc. There are some excel­lent stand­ards now avail­able that can guide you in devel­op­ing these mater­i­als [1, 12 and 13].

Failure Modes:

The major fail­ure modes in this level include:

  • Poorly writ­ten or incom­plete mater­i­als;
  • Provision of the mater­i­als in a lan­guage that is not under­stood by the user;
  • Failure by the user to read and under­stand the mater­i­als;
  • Inability to access the mater­i­als when needed;
  • Etcetera.

When all pos­sib­il­it­ies for inform­ing the user have been covered, you can move to the next level down in the hier­archy. Note that this is the usu­al sep­ar­a­tion point between the man­u­fac­turer and the user of a product. This is nicely illus­trated in Fig 2 from ISO 12100 above. It is import­ant to under­stand at this point that the resid­ual risk posed by the product to the user may not yet be tol­er­able. The user is respons­ible for imple­ment­ing the next two levels in the hier­archy in most cases. The man­u­fac­turer can make recom­mend­a­tions that the user may want to fol­low, but typ­ic­ally that is the extent of influ­ence that the man­u­fac­turer will have on the user.

4. Administrative Controls

This level in the hier­archy includes:

  • Training;
  • Standard Operating Procedures (SOP’s);
  • Safe work­ing pro­ced­ures e.g. Hazardous Energy Control, Lockout, Tagout (where per­mit­ted by law), etc.;
  • Authorization; and
  • Supervision.

Training is the meth­od used to get the inform­a­tion provided by the man­u­fac­turer to the work­er or end user. This can be provided by the man­u­fac­turer, by a third party, or self-​taught by the user or work­er.
SOP’s can include any kind of pro­ced­ure insti­tuted by the work­place to reduce risk. For example, requir­ing work­ers who drive vehicles to do a walk-​around inspec­tion of the vehicle before use, and log­ging of any prob­lems found dur­ing the inspec­tion is an example of an SOP to reduce risk while driv­ing.
Safe work­ing pro­ced­ures can be strongly influ­enced by the man­u­fac­turer through the inform­a­tion for use provided. Maintenance pro­ced­ures for haz­ard­ous tasks provided in the main­ten­ance manu­al are an example of this.
Authorization is the pro­ced­ure that an employ­er uses to author­ize a work­er to carry out a par­tic­u­lar task. For example, an employ­er might put a policy in place that only per­mits licensed elec­tri­cians to access elec­tric­al enclos­ures and carry out work with the enclos­ure live. The employ­er might require that work­ers who may need to use lad­ders in their work take a lad­der safety and a fall pro­tec­tion train­ing course. Once the pre­requis­ites for author­iz­a­tion are com­pleted, the work­er is ‘author­ized’ by the employ­er to carry out the task.
Supervision is one of the most crit­ic­al of the Administrative Controls. Sound super­vi­sion can make all of the above work. Failure to prop­erly super­vise work can cause all of these meas­ures to fail.

Failure Modes

Administrative con­trols have many fail­ure modes. Here are some of the most com­mon:

  • Failure to train;
  • Failure to inform work­ers regard­ing the haz­ards present and the related risks;
  • Failure to cre­ate and imple­ment SOP’s;
  • Failure to provide and main­tain spe­cial equip­ment needed to imple­ment SOP’s;
  • No form­al means of author­iz­a­tion – i.e. How do you KNOW that Joe has his lift truck license?;
  • Failure to super­vise adequately.

I’m sure you can think of MANY oth­er ways that Administrative Controls can go wrong!

5. Personal Protective Equipment (PPE)

PPE includes everything from safety glasses, to hard­hats and bump caps, to fire-​retardant cloth­ing, hear­ing defend­ers, and work boots. Some stand­ards even include warn­ing devices that are worn by the user, such as gas detect­ors and person-​down detect­ors, in this group.
PPE is prob­ably the single most over-​used and least under­stood risk con­trol meas­ure. It falls at the bot­tom of the hier­archy for a num­ber of reas­ons:

  1. It is a meas­ure of last resort;
  2. It per­mits the haz­ard to come as close to the per­son as their cloth­ing;
  3. It is often incor­rectly spe­cified;
  4. It is often poorly fit­ted;
  5. It is often poorly main­tained; and
  6. It is often improp­erly used.

The prob­lems with PPE are hard to deal with. You can­not glue or screw a set of safety glasses to a person’s face, so ensur­ing the the pro­tect­ive equip­ment is used is a big prob­lem that goes back to super­vi­sion.

Many small and medi­um sized enter­prises do not have the expert­ise in the organ­iz­a­tion to prop­erly spe­cify, fit and main­tain the equip­ment.

User com­fort is extremely import­ant. Uncomfortable equip­ment won’t be used for long.

Finally, by the time that prop­erly spe­cified, fit­ted and used equip­ment can do it’s job, the haz­ard is as close to the per­son as it can get. The prob­ab­il­ity of fail­ure at this point is very high, which is what makes PPE a meas­ure of last resort, com­ple­ment­ary to the more effect­ive meas­ures that can be provided in the first three levels of the hier­archy.

If work­ers are not prop­erly trained and adequately informed about the haz­ards they face and the reas­ons behind the use of PPE, they are deprived of the oppor­tun­ity to make safe choices, even if that choice is to refuse the work.

Failure Modes

Failure modes for PPE include:

  • Incorrect spe­cific­a­tion (not suit­able for the haz­ard);
  • Incorrect fit (allows haz­ard to bypass PPE);
  • Poor main­ten­ance (pre­vents or restricts vis­ion or move­ment, increas­ing the risk; causes PPE fail­ure under stress or allows haz­ard to bypass PPE);
  • Incorrect usage (fail­ure to train and inform users, incor­rect selec­tion or spe­cific­a­tion of PPE).

Time to Apply the Hierarchy

So now you know some­thing about the ‘hier­archy of con­trols’. Each lay­er has its own intric­a­cies and nuances that can only be learned by train­ing and exper­i­ence. With a doc­u­mented risk assess­ment in hand, you can begin to apply the hier­archy to con­trol the risks. Don’t for­get to iter­ate the assess­ment post-​control to doc­u­ment the degree of risk reduc­tion achieved. You may cre­ate new haz­ards when con­trol meas­ures are applied, and you may need to add addi­tion­al con­trol meas­ures to achieve effect­ive risk reduc­tion.

The doc­u­ments ref­er­enced below should give you a good start in under­stand­ing some of these chal­lenges.

References

5% Discount on All Standards with code: CC2011 

NOTE: [1], [2], and[3]  were com­bined by ISO and repub­lished as ISO 12100:2010. This stand­ard has no tech­nic­al changes from the pre­ced­ing stand­ards, but com­bines them in a single doc­u­ment. ISO/​TR 14121 – 2 remains cur­rent and should be used with the cur­rent edi­tion of ISO 12100.

[1]             Safety of machinery – Basic con­cepts, gen­er­al prin­ciples for design – Part 1: Basic ter­min­o­logy and meth­od­o­logy, ISO Standard 12100 – 1, 2003.
[2]            Safety of machinery – Basic con­cepts, gen­er­al prin­ciples for design – Basic ter­min­o­logy and meth­od­o­logy, Part 2: Technical prin­ciples, ISO Standard 12100 – 2, 2003.
[3]            Safety of Machinery – Risk Assessment – Part 1: Principles, ISO Standard 14121 – 1, 2007.
[4]            Safety of machinery — Prevention of unex­pec­ted start-​up, ISO 14118, 2000
[5]            Control of haz­ard­ous energy – Lockout and oth­er meth­ods, CSA Z460, 2005
[6]            Fluid power sys­tems and com­pon­ents – Graphic sym­bols and cir­cuit dia­grams – Part 1: Graphic sym­bols for con­ven­tion­al use and data-​processing applic­a­tions, ISO Standard 1219 – 1, 2006
[7]            Pneumatic flu­id power – General rules and safety require­ments for sys­tems and their com­pon­ents, ISO Standard 4414, 1998
[8]            American National Standard for Industrial Robots and Robot Systems — Safety Requirements, ANSI/​RIA R15.06, 1999.
[9]            Safety of machinery — Safety-​related parts of con­trol sys­tems — Part 1: General prin­ciples for design, ISO Standard 13849 – 1, 2006
[10]          Safety of machinery – Functional safety of safety-​related elec­tric­al, elec­tron­ic and pro­gram­mable elec­tron­ic con­trol sys­tems, IEC Standard 62061, 2005
[11]           Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems, IEC Standard 61508-​X, sev­en parts.
[12]          Preparation of Instructions — Structuring, Content and Presentation, IEC Standard 62079, 2001
[13]          American National Standard For Product Safety Information in Product Manuals, Instructions, and Other Collateral Materials, ANSI Standard Z535.6, 2010.
[14]          Control of Hazardous Energy Lockout/​Tagout and Alternative Methods, ANSI Standard Z244.1, 2003.
[15]          Safety of Machinery — Interlocking devices asso­ci­ated with guards — prin­ciples for design and selec­tion, EN 1088+A1:2008.
[16]          Safety of Machinery — Guards – General require­ments for the design and con­struc­tion of fixed and mov­able guards, EN 953+A1:2009.
[17]          Safety of machinery — Guards — General require­ments for the design and con­struc­tion of fixed and mov­able guards, ISO 14120.
[18]         Safety of machinery — Safety dis­tances to pre­vent haz­ard zones being reached by upper and lower limbs, ISO 13857:2008.
[19]         Safety of machinery — Positioning of safe­guards with respect to the approach speeds of parts of the human body, ISO 13855:2010.

5% Discount on All Standards with code: CC2011