ISO 13849–1 Analysis — Part 6: CCF — Common Cause Failures

This entry is part 6 of 9 in the series How to do a 13849–1 analy­sis

What is a Common Cause Failure?

There are two sim­i­lar-sound­ing terms that peo­ple often get con­fused: Com­mon Cause Fail­ure (CCF) and Com­mon Mode Fail­ure. While these two types of fail­ures sound sim­i­lar, they are dif­fer­ent. A Com­mon Cause Fail­ure is a fail­ure in a sys­tem where two or more por­tions of the sys­tem fail at the same time from a sin­gle com­mon cause. An exam­ple could be a light­ning strike that caus­es a con­tac­tor to weld and simul­ta­ne­ous­ly takes out the safe­ty relay proces­sor that con­trols the con­tac­tor. Com­mon cause fail­ures are there­fore two dif­fer­ent man­ners of fail­ure in two dif­fer­ent com­po­nents, but with a sin­gle cause.

Com­mon Mode Fail­ure is where two com­po­nents or por­tions of a sys­tem fail in the same way, at the same time. For exam­ple, two inter­pos­ing relays both fail with weld­ed con­tacts at the same time. The fail­ures could be caused by the same cause or from dif­fer­ent caus­es, but the way the com­po­nents fail is the same.

Com­mon-cause fail­ure includes com­mon mode fail­ure, since a com­mon cause can result in a com­mon man­ner of fail­ure in iden­ti­cal devices used in a sys­tem.

Here are the for­mal def­i­n­i­tions of these terms:

3.1.6 com­mon cause fail­ure CCF

fail­ures of dif­fer­ent items, result­ing from a sin­gle event, where these fail­ures are not con­se­quences of each oth­er

Note 1 to entry: Com­mon cause fail­ures should not be con­fused with com­mon mode fail­ures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04–23.] [1]

 

3.36 com­mon mode fail­ures

fail­ures of items char­ac­ter­ized by the same fault mode

NOTE Com­mon mode fail­ures should not be con­fused with com­mon cause fail­ures, as the com­mon mode fail­ures can result from dif­fer­ent caus­es. [lEV 191–04-24] [3]

The “com­mon mode” fail­ure def­i­n­i­tion uses the phrase “fault mode”, so let’s look at that as well:

fail­ure mode
DEPRECATED: fault mode
man­ner in which fail­ure occurs

Note 1 to entry: A fail­ure mode may be defined by the func­tion lost or oth­er state tran­si­tion that occurred. [IEV 192–03-17] [17]

As you can see, “fault mode” is no longer used, in favour of the more com­mon “fail­ure mode”, so it is pos­si­ble to re-write the com­mon-mode fail­ure def­i­n­i­tion to read, “fail­ures of items char­ac­terised by the same man­ner of fail­ure.”

Random, Systematic and Common Cause Failures

Why do we need to care about this? There are three man­ners in which fail­ures occur: ran­dom fail­ures, sys­tem­at­ic fail­ures, and com­mon cause fail­ures. When devel­op­ing safe­ty relat­ed con­trols, we need to con­sid­er all three and mit­i­gate them as much as pos­si­ble.

Ran­dom fail­ures do not fol­low any pat­tern, occur­ring ran­dom­ly over time, and are often brought on by over-stress­ing the com­po­nent, or from man­u­fac­tur­ing flaws. Ran­dom fail­ures can increase due to envi­ron­men­tal or process-relat­ed stress­es, like cor­ro­sion, EMI, nor­mal wear-and-tear, or oth­er over-stress­ing of the com­po­nent or sub­sys­tem. Ran­dom fail­ures are often mit­i­gat­ed through selec­tion of high-reli­a­bil­i­ty com­po­nents [18].

Sys­tem­at­ic fail­ures include com­mon-cause fail­ures, and occur because some human behav­iour occurred that was not caught by pro­ce­dur­al means. These fail­ures are due to design, spec­i­fi­ca­tion, oper­at­ing, main­te­nance, and instal­la­tion errors. When we look at sys­tem­at­ic errors, we are look­ing for things like train­ing of the sys­tem design­ers, or qual­i­ty assur­ance pro­ce­dures used to val­i­date the way the sys­tem oper­ates. Sys­tem­at­ic fail­ures are non-ran­dom and com­plex, mak­ing them dif­fi­cult to analyse sta­tis­ti­cal­ly. Sys­tem­at­ic errors are a sig­nif­i­cant source of com­mon-cause fail­ures because they can affect redun­dant devices, and because they are often deter­min­is­tic, occur­ring when­ev­er a set of cir­cum­stances exist.

Sys­tem­at­ic fail­ures include many types of errors, such as:

  • Man­u­fac­tur­ing defects, e.g., soft­ware and hard­ware errors built into the device by the man­u­fac­tur­er.
  • Spec­i­fi­ca­tion mis­takes, e.g. incor­rect design basis and inac­cu­rate soft­ware spec­i­fi­ca­tion.
  • Imple­men­ta­tion errors, e.g., improp­er instal­la­tion, incor­rect pro­gram­ming, inter­face prob­lems, and not fol­low­ing the safe­ty man­u­al for the devices used to realise the safe­ty func­tion.
  • Oper­a­tion and main­te­nance, e.g., poor inspec­tion, incom­plete test­ing and improp­er bypass­ing [18].

Diverse redun­dan­cy is com­mon­ly used to mit­i­gate sys­tem­at­ic fail­ures, since dif­fer­ences in com­po­nent or sub­sys­tem design tend to cre­ate non-over­lap­ping sys­tem­at­ic fail­ures, reduc­ing the like­li­hood of a com­mon error cre­at­ing a com­mon-mode fail­ure. Errors in spec­i­fi­ca­tion, imple­men­ta­tion, oper­a­tion and main­te­nance are not affect­ed by diver­si­ty.

Fig 1 below shows the results of a small study done by the UK’s Health and Safe­ty Exec­u­tive in 1994 [19] that sup­ports the idea that sys­tem­at­ic fail­ures are a sig­nif­i­cant con­trib­u­tor to safe­ty sys­tem fail­ures. The study includ­ed only 34 sys­tems (n=34), so the results can­not be con­sid­ered con­clu­sive. How­ev­er, there were some star­tling results. As you can see, errors in the spec­i­fi­ca­tion of the safe­ty func­tions (Safe­ty Require­ment Spec­i­fi­ca­tion) result­ed in about 44% of the sys­tem fail­ures in the study. Based on this small sam­ple, sys­tem­at­ic fail­ures appear to be a sig­ni­fi­cate source of fail­ures.

Pie chart illustrating the proportion of failures in each phase of the life cycle of a machine, based on data taken from HSE Report HSG238.
Fig­ure 1 — HSG 238 Pri­ma­ry Caus­es of Fail­ure by Life Cycle Stage

Handling CCF in ISO 13849–1

Now that we under­stand WHAT Com­mon-Cause Fail­ure is, and WHY it’s impor­tant, we can talk about HOW it is han­dled in ISO 13849–1. Since ISO 13849–1 is intend­ed to be a sim­pli­fied func­tion­al safe­ty stan­dard, CCF analy­sis is lim­it­ed to a check­list in Annex F, Table F.1. Note that Annex F is infor­ma­tive, mean­ing that it is guid­ance mate­r­i­al to help you apply the stan­dard. Since this is the case, you could use any oth­er means suit­able for assess­ing CCF mit­i­ga­tion, like those in IEC 61508, or in oth­er stan­dards.

Table F.1 is set up with a series of mit­i­ga­tion mea­sures which are grouped togeth­er in relat­ed cat­e­gories. Each group is pro­vid­ed with a score that can be claimed if you have imple­ment­ed the mit­i­ga­tions in that group. ALL OF THE MEASURES in each group must be ful­filled in order to claim the points for that cat­e­go­ry. Here’s an exam­ple:

A portion of ISO 13849-1 Table F.1.
ISO 13849–1:2015, Table F.1 Excerpt

In order to claim the 20 points avail­able for the use of sep­a­ra­tion or seg­re­ga­tion in the sys­tem design, there must be a sep­a­ra­tion between the sig­nal paths. Sev­er­al exam­ples of this are giv­en for clar­i­ty.

Table F.1 lists six groups of mit­i­ga­tion mea­sures. In order to claim ade­quate CCF mit­i­ga­tion, a min­i­mum score of 65 points must be achieved. Only Cat­e­go­ry 2, 3 and 4 archi­tec­tures are required to meet the CCF require­ments in order to claim the PL, but with­out meet­ing the CCF require­ment you can­not claim the PL, regard­less of whether the design meets the oth­er cri­te­ria or not.

One final note on CCF: If you are try­ing to review an exist­ing con­trol sys­tem, say in an exist­ing machine, or in a machine designed by a third par­ty where you have no way to deter­mine the expe­ri­ence and train­ing of the design­ers or the capa­bil­i­ty of the company’s change man­age­ment process, then you can­not ade­quate­ly assess CCF [8]. This fact is recog­nised in CSA Z432-16 [20], chap­ter 8. [20] allows the review­er to sim­ply ver­i­fy that the archi­tec­tur­al require­ments, exclu­sive of any prob­a­bilis­tic require­ments, have been met. This is par­tic­u­lar­ly use­ful for engi­neers review­ing machin­ery under Ontario’s Pre-Start Health and Safe­ty require­ments [21], who are fre­quent­ly work­ing with less-than-com­plete design doc­u­men­ta­tion.

In case you missed the first part of the series, you can read it here. In the next arti­cle in this series, I’m going to review the process flow for sys­tem analy­sis as cur­rent­ly out­lined in ISO 13849–1. Watch for it!

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. The com­plete ref­er­ence list is includ­ed in the last post of the series.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[2]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 2: Val­i­da­tion. 2nd Edi­tion. ISO Stan­dard 13849–2. 2012.

[3]      Safe­ty of machin­ery — Gen­er­al prin­ci­ples for design — Risk assess­ment and risk reduc­tion. ISO Stan­dard 12100. 2010.

[8]     S. Joce­lyn, J. Bau­doin, Y. Chin­ni­ah, and P. Char­p­en­tier, “Fea­si­bil­i­ty study and uncer­tain­ties in the val­i­da­tion of an exist­ing safe­ty-relat­ed con­trol cir­cuit with the ISO 13849–1:2006 design stan­dard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.

[17]      “fail­ure mode”, 192–03-17, Inter­na­tion­al Elec­trotech­ni­cal Vocab­u­lary. IEC Inter­na­tion­al Elec­trotech­ni­cal Com­mis­sion, Gene­va, 2015.

[18]      M. Gen­tile and A. E. Sum­mers, “Com­mon Cause Fail­ure: How Do You Man­age Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331–338, 2006.

[19]     Out of Control—Why con­trol sys­tems go wrong and how to pre­vent fail­ure, 2nd ed. Rich­mond, Sur­rey, UK: HSE Health and Safe­ty Exec­u­tive, 2003.

[20]     Safe­guard­ing of Machin­ery. 3rd Edi­tion. CSA Stan­dard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Cana­da, 1990.

Series Nav­i­ga­tionISO 13849–1 Analy­sis — Part 5: Diag­nos­tic Cov­er­age (DC)”>ISO 13849–1 Analy­sis — Part 5: Diag­nos­tic Cov­er­age (DC)ISO 13849–1 Analy­sis — Part 7: Safe­ty-Relat­ed Soft­ware”>ISO 13849–1 Analy­sis — Part 7: Safe­ty-Relat­ed Soft­ware

Author: Doug Nix

+DougNix is Managing Director and Principal Consultant at Compliance InSight Consulting, Inc. (http://www.complianceinsight.ca) in Kitchener, Ontario, and is Lead Author and Managing Editor of the Machinery Safety 101 blog. Doug's work includes teaching machinery risk assessment techniques privately and through Conestoga College Institute of Technology and Advanced Learning in Kitchener, Ontario, as well as providing technical services and training programs to clients related to risk assessment, industrial machinery safety, safety-related control system integration and reliability, laser safety and regulatory conformity. Follow me on Academia.edu//a.academia-assets.com/javascripts/social.js