ISO 13849 – 1 Analysis — Part 6: CCF — Common Cause Failures

This entry is part 6 of 9 in the series How to do a 13849 – 1 ana­lys­is

What is a Common Cause Failure?

There are two similar-​sounding terms that people often get con­fused: Common Cause Failure (CCF) and Common Mode Failure. While these two types of fail­ures sound sim­il­ar, they are dif­fer­ent. A Common Cause Failure is a fail­ure in a sys­tem where two or more por­tions of the sys­tem fail at the same time from a single com­mon cause. An example could be a light­ning strike that causes a con­tact­or to weld and sim­ul­tan­eously takes out the safety relay pro­cessor that con­trols the con­tact­or. Common cause fail­ures are there­fore two dif­fer­ent man­ners of fail­ure in two dif­fer­ent com­pon­ents, but with a single cause.

Common Mode Failure is where two com­pon­ents or por­tions of a sys­tem fail in the same way, at the same time. For example, two inter­pos­ing relays both fail with wel­ded con­tacts at the same time. The fail­ures could be caused by the same cause or from dif­fer­ent causes, but the way the com­pon­ents fail is the same.

Common-​cause fail­ure includes com­mon mode fail­ure, since a com­mon cause can res­ult in a com­mon man­ner of fail­ure in identic­al devices used in a sys­tem.

Here are the form­al defin­i­tions of these terms:

3.1.6 com­mon cause fail­ure CCF

fail­ures of dif­fer­ent items, res­ult­ing from a single event, where these fail­ures are not con­sequences of each oth­er

Note 1 to entry: Common cause fail­ures should not be con­fused with com­mon mode fail­ures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04 – 23.] [1]

 

3.36 com­mon mode fail­ures

fail­ures of items char­ac­ter­ized by the same fault mode

NOTE Common mode fail­ures should not be con­fused with com­mon cause fail­ures, as the com­mon mode fail­ures can res­ult from dif­fer­ent causes. [lEV 191 – 04-​24] [3]

The “com­mon mode” fail­ure defin­i­tion uses the phrase “fault mode”, so let’s look at that as well:

fail­ure mode
DEPRECATED: fault mode
man­ner in which fail­ure occurs

Note 1 to entry: A fail­ure mode may be defined by the func­tion lost or oth­er state trans­ition that occurred. [IEV 192 – 03-​17] [17]

As you can see, “fault mode” is no longer used, in favour of the more com­mon “fail­ure mode”, so it is pos­sible to re-​write the common-​mode fail­ure defin­i­tion to read, “fail­ures of items char­ac­ter­ised by the same man­ner of fail­ure.”

Random, Systematic and Common Cause Failures

Why do we need to care about this? There are three man­ners in which fail­ures occur: ran­dom fail­ures, sys­tem­at­ic fail­ures, and com­mon cause fail­ures. When devel­op­ing safety related con­trols, we need to con­sider all three and mit­ig­ate them as much as pos­sible.

Random fail­ures do not fol­low any pat­tern, occur­ring ran­domly over time, and are often brought on by over-​stressing the com­pon­ent, or from man­u­fac­tur­ing flaws. Random fail­ures can increase due to envir­on­ment­al or process-​related stresses, like cor­ro­sion, EMI, nor­mal wear-​and-​tear, or oth­er over-​stressing of the com­pon­ent or sub­sys­tem. Random fail­ures are often mit­ig­ated through selec­tion of high-​reliability com­pon­ents [18].

Systematic fail­ures include common-​cause fail­ures, and occur because some human beha­viour occurred that was not caught by pro­ced­ur­al means. These fail­ures are due to design, spe­cific­a­tion, oper­at­ing, main­ten­ance, and install­a­tion errors. When we look at sys­tem­at­ic errors, we are look­ing for things like train­ing of the sys­tem design­ers, or qual­ity assur­ance pro­ced­ures used to val­id­ate the way the sys­tem oper­ates. Systematic fail­ures are non-​random and com­plex, mak­ing them dif­fi­cult to ana­lyse stat­ist­ic­ally. Systematic errors are a sig­ni­fic­ant source of common-​cause fail­ures because they can affect redund­ant devices, and because they are often determ­in­ist­ic, occur­ring whenev­er a set of cir­cum­stances exist.

Systematic fail­ures include many types of errors, such as:

  • Manufacturing defects, e.g., soft­ware and hard­ware errors built into the device by the man­u­fac­turer.
  • Specification mis­takes, e.g. incor­rect design basis and inac­cur­ate soft­ware spe­cific­a­tion.
  • Implementation errors, e.g., improp­er install­a­tion, incor­rect pro­gram­ming, inter­face prob­lems, and not fol­low­ing the safety manu­al for the devices used to real­ise the safety func­tion.
  • Operation and main­ten­ance, e.g., poor inspec­tion, incom­plete test­ing and improp­er bypassing [18].

Diverse redund­ancy is com­monly used to mit­ig­ate sys­tem­at­ic fail­ures, since dif­fer­ences in com­pon­ent or sub­sys­tem design tend to cre­ate non-​overlapping sys­tem­at­ic fail­ures, redu­cing the like­li­hood of a com­mon error cre­at­ing a common-​mode fail­ure. Errors in spe­cific­a­tion, imple­ment­a­tion, oper­a­tion and main­ten­ance are not affected by diversity.

Fig 1 below shows the res­ults of a small study done by the UK’s Health and Safety Executive in 1994 [19] that sup­ports the idea that sys­tem­at­ic fail­ures are a sig­ni­fic­ant con­trib­ut­or to safety sys­tem fail­ures. The study included only 34 sys­tems (n=34), so the res­ults can­not be con­sidered con­clus­ive. However, there were some start­ling res­ults. As you can see, errors in the spe­cific­a­tion of the safety func­tions (Safety Requirement Specification) res­ul­ted in about 44% of the sys­tem fail­ures in the study. Based on this small sample, sys­tem­at­ic fail­ures appear to be a sig­ni­fic­ate source of fail­ures.

Pie chart illustrating the proportion of failures in each phase of the life cycle of a machine, based on data taken from HSE Report HSG238.
Figure 1 – HSG 238 Primary Causes of Failure by Life Cycle Stage

Handling CCF in ISO 13849 – 1

Now that we under­stand WHAT Common-​Cause Failure is, and WHY it’s import­ant, we can talk about HOW it is handled in ISO 13849 – 1. Since ISO 13849 – 1 is inten­ded to be a sim­pli­fied func­tion­al safety stand­ard, CCF ana­lys­is is lim­ited to a check­list in Annex F, Table F.1. Note that Annex F is inform­at­ive, mean­ing that it is guid­ance mater­i­al to help you apply the stand­ard. Since this is the case, you could use any oth­er means suit­able for assess­ing CCF mit­ig­a­tion, like those in IEC 61508, or in oth­er stand­ards.

Table F.1 is set up with a series of mit­ig­a­tion meas­ures which are grouped togeth­er in related cat­egor­ies. Each group is provided with a score that can be claimed if you have imple­men­ted the mit­ig­a­tions in that group. ALL OF THE MEASURES in each group must be ful­filled in order to claim the points for that cat­egory. Here’s an example:

A portion of ISO 13849-1 Table F.1.
ISO 13849 – 1:2015, Table F.1 Excerpt

In order to claim the 20 points avail­able for the use of sep­ar­a­tion or segreg­a­tion in the sys­tem design, there must be a sep­ar­a­tion between the sig­nal paths. Several examples of this are giv­en for clar­ity.

Table F.1 lists six groups of mit­ig­a­tion meas­ures. In order to claim adequate CCF mit­ig­a­tion, a min­im­um score of 65 points must be achieved. Only Category 2, 3 and 4 archi­tec­tures are required to meet the CCF require­ments in order to claim the PL, but without meet­ing the CCF require­ment you can­not claim the PL, regard­less of wheth­er the design meets the oth­er cri­ter­ia or not.

One final note on CCF: If you are try­ing to review an exist­ing con­trol sys­tem, say in an exist­ing machine, or in a machine designed by a third party where you have no way to determ­ine the exper­i­ence and train­ing of the design­ers or the cap­ab­il­ity of the company’s change man­age­ment pro­cess, then you can­not adequately assess CCF [8]. This fact is recog­nised in CSA Z432-​16 [20], chapter 8. [20] allows the review­er to simply veri­fy that the archi­tec­tur­al require­ments, exclus­ive of any prob­ab­il­ist­ic require­ments, have been met. This is par­tic­u­larly use­ful for engin­eers review­ing machinery under Ontario’s Pre-​Start Health and Safety require­ments [21], who are fre­quently work­ing with less-​than-​complete design doc­u­ment­a­tion.

In case you missed the first part of the series, you can read it here. In the next art­icle in this series, I’m going to review the pro­cess flow for sys­tem ana­lys­is as cur­rently out­lined in ISO 13849 – 1. Watch for it!

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety crit­ic­al sys­tems hand­book. Amsterdam: Elsevier/​Butterworth-​Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of tech­niques and meas­ures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of tech­niques and meas­ures related to EMC for Functional Safety, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. The com­plete ref­er­ence list is included in the last post of the series.

[1]     Safety of machinery — Safety-​related parts of con­trol sys­tems — Part 1: General prin­ciples for design. 3rd Edition. ISO Standard 13849 – 1. 2015.

[2]     Safety of machinery – Safety-​related parts of con­trol sys­tems – Part 2: Validation. 2nd Edition. ISO Standard 13849 – 2. 2012.

[3]      Safety of machinery – General prin­ciples for design – Risk assess­ment and risk reduc­tion. ISO Standard 12100. 2010.

[8]     S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncer­tain­ties in the val­id­a­tion of an exist­ing safety-​related con­trol cir­cuit with the ISO 13849 – 1:2006 design stand­ard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104 – 112, Jan. 2014.

[17]      “fail­ure mode”, 192−03−17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331 – 338, 2006.

[19]     Out of Control — Why con­trol sys­tems go wrong and how to pre­vent fail­ure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

[20]     Safeguarding of Machinery. 3rd Edition. CSA Standard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Canada, 1990.

31-​Dec-​2011 – Are YOU ready?

This entry is part 8 of 8 in the series Circuit Architectures Explored

31-​December-​2011 marks a key mile­stone for machine build­ers mar­ket­ing their products in the European Union, the EEA and many of the Candidate States. Functional Safety takes a pos­it­ive step for­ward with the man­dat­ory applic­a­tion of EN ISO 13849 – 1 and -2. As of 1-​January-​2012, the safety-​related parts of the con­trol sys­tems on all machinery bear­ing a CE Mark will be required to meet these stand­ards.

This change star­ted six years ago, when these stand­ards were first har­mon­ized under the Machinery Directive. The EC Machinery Committee gave machine build­ers an addi­tion­al three years to make the trans­ition to these stand­ards, after much oppos­i­tion to the ori­gin­al man­dat­ory imple­ment­a­tion date of 31-​Dec-​08 was announced.

If you aren’t aware of these stand­ards, or if you aren’t famil­i­ar with the concept of func­tion­al safety, you need to get up to speed, and fast.

Under EN 954 – 1:1995 and the 1st Edition of ISO 13849 – 1, pub­lished in 1999, a design­er needed to select a design Category or archi­tec­ture, that would provide the degree of fault tol­er­ance and reli­ab­il­ity needed based on the out­come of the risk assess­ment for the machinery. The Categories, B, 1 – 4, remain unchanged in the 2nd Edition. I’ve talked about the Categories in detail in oth­er posts, so I won’t spend any time on them here.

The 2nd Edition brings Mean Time to Failure into the pic­ture, along with Diagnostic Coverage and Common Cause Failures. These new con­cepts require design­ers to use more ana­lyt­ic­al tech­niques in devel­op­ing their designs, and also require addi­tion­al doc­u­ment­a­tion (as usu­al!).

One of the main fail­ings with EN 954 – 1 was Validation. This top­ic was sup­posed to have been covered by EN 954 – 2, but this stand­ard was nev­er pub­lished. This has led machine build­ers to make design decisions without keep­ing the neces­sary design doc­u­ment­a­tion trail, and fur­ther­more, to skip the Validation step entirely in many cases.

The miss­ing Validation stand­ard was finally pub­lished in 2003 as ISO 13849 – 2:2003, and sub­sequently adop­ted and har­mon­ized in 2009 as EN ISO 13849 – 2:2003. While no man­dat­ory imple­ment­a­tion date for this stand­ard is giv­en in the cur­rent list of stand­ards har­mon­ized under 2006/​42/​EC-​Machinery, use of Part 1 of the stand­ard man­dates use of Part 2, so this stand­ard is effect­ively man­dat­ory at the same time.

Part 2 brings a num­ber of key annexes that are neces­sary for the imple­ment­a­tion of Part 1, and also out­lines the com­plete doc­u­ment­a­tion trail needed for val­id­a­tion, and coin­cid­ent­ally, audit. Notified bpdies will be look­ing for this inform­a­tion when eval­u­at­ing the con­tent of Technical Files used in CE Marking.

From a North American per­spect­ive, these two stand­ards gain access through ANSI’s adop­tion of ISO 10218 for Industrial Robots. Part 1 of this stand­ard, cov­er­ing the robot itself, was adop­ted last year. Part 2 of the stand­ard will be adop­ted in 2012, and RIA R15.06 will be with­drawn. At the same time, CSA will be adopt­ing the ISO stand­ards and with­draw­ing CSA Z434.

These changes will finally bring North America, the International Community and the EU onto the same foot­ing when it comes to Functional Safety in indus­tri­al machinery applic­a­tions. The days of “SIMPLE, SINGLE CHANNEL, SINGLE CHANNEL-​MONITORED and CONTROL RELIABLE” are numbered.

Are you ready?

Compliance InSight Consulting will be offer­ing a series of train­ing events in 2012 on this top­ic. For more inform­a­tion, con­tact Doug Nix.

Inconsistencies in ISO 13849 – 1:2006

This entry is part 7 of 8 in the series Circuit Architectures Explored

I’ve writ­ten quite a bit recently on the top­ic of cir­cuit archi­tec­tures under ISO 13849 – 1, and one of my read­ers noticed an incon­sist­ency between the text of the stand­ard and Figure 5, the dia­gram that shows how the cat­egor­ies can span one or more Performance Levels.

ISO 13849-1 Figure 5
ISO 13849 – 1, Figure 5: Relationship between Categories, DC, MTTFd and PL

If you look at Category 2 in Figure 5, you will notice that there are TWO bands, one for DCavg LOW and one for DCavg MED. However, read­ing the text of the defin­i­tion for Category 2 gives (§6.2.5):

The dia­gnost­ic cov­er­age (DCavg) of the total SRP/​CS includ­ing fault-​detection shall be low.

This leaves some con­fu­sion, because it appears from the dia­gram that there are two options for this archi­tec­ture. This is backed up by the data in Annex K that under­lies the dia­gram.

The same con­fu­sion exists in the text describ­ing Category 3, with Figure 5 show­ing two bands, one for DCavg LOW and one for DCavg MED.

I con­tac­ted the ISO TC199 Secretariat, the people respons­ible for the con­tent of ISO 13849 – 1, and poin­ted out this appar­ent con­flict. They respon­ded that they would pass the com­ment on to the TC for res­ol­u­tion, and would con­tact me if they needed addi­tion­al inform­a­tion. As of this writ­ing, I have not heard more.

So what should you do if you are try­ing to design to this stand­ard? My advice is to fol­low Figure 5. If you can achieve a DCavg MED in your design, it is com­pletely reas­on­able to claim a high­er PL. Refer to the data in Annex K to see where your design falls once you have com­pleted the MTTFd cal­cu­la­tions.

Thanks to Richard Harris and Douglas Florence, both mem­bers of the ISO 13849 and IEC 62061 Group on LinkedIn for bring­ing this to my atten­tion!

If you are inter­ested in con­tact­ing the TC199 Secretariat, you can email the Secretary, Mr. Stephen Kennedy. More details on ISO TC199 can be found on the Technical Committee page on the ISO web Site.