ISO 13849–1 Analysis — Part 8: Fault Exclusion

This entry is part 9 of 9 in the series How to do a 13849–1 analy­sis

Fault Consideration & Fault Exclusion

ISO 13849–1, Chap­ter 7 [1, 7] dis­cuss­es the need for fault con­sid­er­a­tion and fault exclu­sion. Fault con­sid­er­a­tion is the process of exam­in­ing the com­po­nents and sub-sys­tems used in the safe­ty-relat­ed part of the con­trol sys­tem (SRP/CS) and mak­ing a list of all the faults that could occur in each one. This a def­i­nite­ly non-triv­ial exer­cise!

Think­ing back to some of the ear­li­er arti­cles in this series where I men­tioned the dif­fer­ent types of faults, you may recall that there are detectable and unde­tectable faults, and there are safe and dan­ger­ous faults, lead­ing us to four kinds of fault:

  • Safe unde­tectable faults
  • Dan­ger­ous unde­tectable faults
  • Safe detectable faults
  • Dan­ger­ous detectable faults

For sys­tems where no diag­nos­tics are used, Cat­e­go­ry B and 1, faults need to be elim­i­nat­ed using inher­ent­ly safe design tech­niques. Care needs to be tak­en when clas­si­fy­ing com­po­nents as “well-tried” ver­sus using a fault exclu­sion, as com­po­nents that might nor­mal­ly be con­sid­ered “well-tried” might not meet those require­ments in every appli­ca­tion. [2, Annex A], Val­i­da­tion tools for mechan­i­cal sys­tems, dis­cuss­es the con­cepts of “Basic Safe­ty Prin­ci­ples”, “Well-Tried Safe­ty Prin­ci­ples”, and “Well-tried com­po­nents”.  [2, Annex A] also pro­vides exam­ples of faults and rel­e­vant fault exclu­sion cri­te­ria. There are sim­i­lar Annex­es that cov­er pneu­mat­ic sys­tems [2, Annex B], hydraulic sys­tems [2, Annex C], and elec­tri­cal sys­tems [2, Annex D].

For sys­tems where diag­nos­tics are part of the design, i.e., Cat­e­go­ry 2, 3, and 4, the fault lists are used to eval­u­ate the diag­nos­tic cov­er­age (DC) of the test sys­tems. Depend­ing on the archi­tec­ture, cer­tain lev­els of DC are required to meet the rel­e­vant PL, see [1, Fig. 5]. The fault lists are start­ing point for the deter­mi­na­tion of DC, and are an input into the hard­ware and soft­ware designs. All of the dan­ger­ous detectable faults must be cov­ered by the diag­nos­tics, and the DC must be high enough to meet the PLr for the safe­ty func­tion.

The fault lists and fault exclu­sions are used in the Val­i­da­tion por­tion of this process as well. At the start of the Val­i­da­tion process flow­chart [2, Fig. 1], you can see how the fault lists and the cri­te­ria used for fault exclu­sion are used as inputs to the val­i­da­tion plan.

The diagram shows the first few stages in the ISO 13849-2 Validation process. See ISO 13849-2, Figure 1.
Start of ISO 13849–2 Fig. 1

Faults that can be exclud­ed do not need to val­i­dat­ed, sav­ing time and effort dur­ing the sys­tem ver­i­fi­ca­tion and val­i­da­tion (V & V). How is this done?

Fault Consideration

The first step is to devel­op a list of poten­tial faults that could occur, based on the com­po­nents and sub­sys­tems includ­ed in SRP/CS. ISO 13849–2 [2] includes lists of typ­i­cal faults for var­i­ous tech­nolo­gies. For exam­ple, [2, Table A.4] is the fault list for mechan­i­cal com­po­nents.

Mechanical fault list from ISO 13849-2
Table A.4 — Faults and fault exclu­sions — Mechan­i­cal devices, com­po­nents and ele­ments
(e.g. cam, fol­low­er, chain, clutch, brake, shaft, screw, pin, guide, bear­ing)

[2] con­tains tables sim­i­lar to Table A.4 for:

  • Pres­sure-coil springs
  • Direc­tion­al con­trol valves
  • Stop (shut-off) valves/non-return (check) valves/quick-action vent­ing valves/shuttle valves, etc.
  • Flow valves
  • Pres­sure valves
  • Pipework
  • Hose assem­blies
  • Con­nec­tors
  • Pres­sure trans­mit­ters and pres­sure medi­um trans­duc­ers
  • Com­pressed air treat­ment — Fil­ters
  • Com­pressed-air treat­ment — Oil­ers
  • Com­pressed air treat­ment — Silencers
  • Accu­mu­la­tors and pres­sure ves­sels
  • Sen­sors
  • Flu­idic Infor­ma­tion pro­cess­ing — Log­i­cal ele­ments
  • etc.

As you can see, there are many dif­fer­ent types of faults that need to be con­sid­ered. Keep in mind that I did not give you all of the dif­fer­ent fault lists — this post would be a mile long if I did that! The point is that you need to devel­op a fault list for your sys­tem, and then con­sid­er the impact of each fault on the oper­a­tion of the sys­tem. If you have com­po­nents or sub­sys­tems that are not list­ed in the tables, then you need to devel­op your own fault lists for those items. Fail­ure Modes and Effects Analy­sis (FMEA) is usu­al­ly the best approach for devel­op­ing fault lists for these com­po­nents [23], [24].

When con­sid­er­ing the faults to be includ­ed in the list there are a few things that should be con­sid­ered [1, 7.2]:

  • if after the first fault occurs oth­er faults devel­op due to the first fault, then you can group those faults togeth­er as a sin­gle fault
  • two or more sin­gle faults with a com­mon cause can be con­sid­ered as a sin­gle fault
  • mul­ti­ple faults with dif­fer­ent caus­es but occur­ring simul­ta­ne­ous­ly is con­sid­ered improb­a­ble and does not need to be con­sid­ered

Examples

#1 — Voltage Regulator

A volt­age reg­u­la­tor fails in a sys­tem pow­er sup­ply so that the 24 Vdc out­put ris­es to an unreg­u­lat­ed 36 Vdc (the inter­nal pow­er sup­ply bus volt­age), and after some time has passed, two sen­sors fail. All three fail­ures can be grouped and con­sid­ered as a sin­gle fault because they orig­i­nate in a sin­gle fail­ure in the volt­age reg­u­la­tor.

#2 — Lightning Strike

If a light­ning strike occurs on the pow­er line and the result­ing surge volt­age on the 400 V mains caus­es an inter­pos­ing con­tac­tor and the motor dri­ve it con­trols to fail to dan­ger, then these fail­ures may be grouped and con­sid­ered as one. Again, a sin­gle event caus­es all of the sub­se­quent fail­ures.

#3 — Pneumatic System Lubrication

3a — A pneu­mat­ic lubri­ca­tor runs out of lubri­cant and is not refilled, depriv­ing down­stream pneu­mat­ic com­po­nents of lubri­ca­tion.

3b — The spool on the sys­tem dump valve sticks open because it is not cycled often enough.

Nei­ther of these fail­ures has the same cause, so there is no need to con­sid­er them as occur­ring simul­ta­ne­ous­ly because the prob­a­bil­i­ty of both hap­pen­ing con­cur­rent­ly is extreme­ly small. One cau­tion: These two faults MAY have a com­mon cause — poor main­te­nance. If this is true and you decide to con­sid­er them to be two faults with a com­mon cause, they could then be grouped as a sin­gle fault.

Fault Exclusion

Once you have your well-con­sid­ered fault lists togeth­er, the next ques­tion is “Can any of the list­ed faults be exclud­ed?” This is a tricky ques­tion! There are a few points to con­sid­er:

  • Does the sys­tem archi­tec­ture allow for fault exclu­sion?
  • Is the fault tech­ni­cal­ly improb­a­ble, even if it is pos­si­ble?
  • Does expe­ri­ence show that the fault is unlike­ly to occur?*
  • Are there tech­ni­cal require­ments relat­ed to the appli­ca­tion and the haz­ard that might sup­port fault exclu­sion?

BE CAREFUL with this one!

When­ev­er faults are exclud­ed, a detailed jus­ti­fi­ca­tion for the exclu­sion needs to be includ­ed in the sys­tem design doc­u­men­ta­tion. Sim­ply decid­ing that the fault can be exclud­ed is NOT ENOUGH! Con­sid­er the risk a per­son will be exposed to in the event the fault occurs. If the sever­i­ty is very high, i.e., severe per­ma­nent injury or death, you may not want to exclude the fault even if you think you could. Care­ful con­sid­er­a­tion of the result­ing injury sce­nario is need­ed.

Bas­ing a fault exclu­sion on per­son­al expe­ri­ence is sel­dom con­sid­ered ade­quate, which is why I added the aster­isk (*) above. Look for good sta­tis­ti­cal data to sup­port any deci­sion to use a fault exclu­sion.

There is much more infor­ma­tion avail­able in IEC 61508–2 on the sub­ject of fault exclu­sion, and there is good infor­ma­tion in some of the books men­tioned below [0.1], [0.2], and [0.3]. If you know of addi­tion­al resources you would like to share, please post the infor­ma­tion in the com­ments!

Definitions

3.1.3 fault
state of an item char­ac­ter­ized by the inabil­i­ty to per­form a required func­tion, exclud­ing the inabil­i­ty dur­ing pre­ven­tive main­te­nance or oth­er planned actions, or due to lack of exter­nal resources
Note 1 to entry: A fault is often the result of a fail­ure of the item itself, but may exist with­out pri­or fail­ure.
Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault. [SOURCE: IEC 60050?191:1990, 05–01.]

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[2]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 2: Val­i­da­tion. 2nd Edi­tion. ISO Stan­dard 13849–2. 2012.

[3]      Safe­ty of machin­ery — Gen­er­al prin­ci­ples for design — Risk assess­ment and risk reduc­tion. ISO Stan­dard 12100. 2010.

[4]     Safe­guard­ing of Machin­ery. 2nd Edi­tion. CSA Stan­dard Z432. 2004.

[5]     Risk Assess­ment and Risk Reduc­tion- A Guide­line to Esti­mate, Eval­u­ate and Reduce Risks Asso­ci­at­ed with Machine Tools. ANSI Tech­ni­cal Report B11.TR3. 2000.

[6]    Safe­ty of machin­ery — Emer­gency stop func­tion — Prin­ci­ples for design. ISO Stan­dard 13850. 2015.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. 7 parts. IEC Stan­dard 61508. Edi­tion 2. 2010.

[8]     S. Joce­lyn, J. Bau­doin, Y. Chin­ni­ah, and P. Char­p­en­tier, “Fea­si­bil­i­ty study and uncer­tain­ties in the val­i­da­tion of an exist­ing safe­ty-relat­ed con­trol cir­cuit with the ISO 13849–1:2006 design stan­dard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.

[9]    Guid­ance on the appli­ca­tion of ISO 13849–1 and IEC 62061 in the design of safe­ty-relat­ed con­trol sys­tems for machin­ery. IEC Tech­ni­cal Report TR 62061–1. 2010.

[10]     Safe­ty of machin­ery — Func­tion­al safe­ty of safe­ty-relat­ed elec­tri­cal, elec­tron­ic and pro­gram­ma­ble elec­tron­ic con­trol sys­tems. IEC Stan­dard 62061. 2005.

[11]    Guid­ance on the appli­ca­tion of ISO 13849–1 and IEC 62061 in the design of safe­ty-relat­ed con­trol sys­tems for machin­ery. IEC Tech­ni­cal Report 62061–1. 2010.

[12]    D. S. G. Nix, Y. Chin­ni­ah, F. Dosio, M. Fessler, F. Eng, and F. Schr­ev­er, “Link­ing Risk and Reliability—Mapping the out­put of risk assess­ment tools to func­tion­al safe­ty require­ments for safe­ty relat­ed con­trol sys­tems,” 2015.

[13]    Safe­ty of machin­ery. Safe­ty relat­ed parts of con­trol sys­tems. Gen­er­al prin­ci­ples for design. CEN Stan­dard EN 954–1. 1996.

[14]   Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems — Part 2: Require­ments for electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. IEC Stan­dard 61508–2. 2010.

[15]     Reli­a­bil­i­ty Pre­dic­tion of Elec­tron­ic Equip­ment. Mil­i­tary Hand­book MIL-HDBK-217F. 1991.

[16]     “IFA — Prac­ti­cal aids: Soft­ware-Assis­tent SISTEMA: Safe­ty Integri­ty — Soft­ware Tool for the Eval­u­a­tion of Machine Appli­ca­tions”, Dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].

[17]      “fail­ure mode”, 192–03-17, Inter­na­tion­al Elec­trotech­ni­cal Vocab­u­lary. IEC Inter­na­tion­al Elec­trotech­ni­cal Com­mis­sion, Gene­va, 2015.

[18]      M. Gen­tile and A. E. Sum­mers, “Com­mon Cause Fail­ure: How Do You Man­age Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331–338, 2006.

[19]     Out of Control—Why con­trol sys­tems go wrong and how to pre­vent fail­ure, 2nd ed. Rich­mond, Sur­rey, UK: HSE Health and Safe­ty Exec­u­tive, 2003.

[20]     Safe­guard­ing of Machin­ery. 3rd Edi­tion. CSA Stan­dard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Cana­da, 1990.

[22]     “Field-pro­gram­ma­ble gate array”, En.wikipedia.org, 2017. [Online]. Avail­able: https://en.wikipedia.org/wiki/Field-programmable_gate_array. [Accessed: 16-Jun-2017].

[23]     Analy­sis tech­niques for sys­tem reli­a­bil­i­ty – Pro­ce­dure for fail­ure mode and effects analy­sis (FMEA). 2nd Ed. IEC Stan­dard 60812. 2006.

[24]     “Fail­ure mode and effects analy­sis”, En.wikipedia.org, 2017. [Online]. Avail­able: https://en.wikipedia.org/wiki/Failure_mode_and_effects_analysis. [Accessed: 16-Jun-2017].

ISO 13849–1 Analysis — Part 5: Diagnostic Coverage (DC)

This entry is part 5 of 9 in the series How to do a 13849–1 analy­sis

What is Diagnostic Coverage?

Under­stand­ing Diag­nos­tic Cov­er­age (DC) as it is used in ISO 13849–1 [1] is crit­i­cal to analysing the design of any safe­ty func­tion assessed using this stan­dard. In case you missed a pre­vi­ous part of the series, you can read it here.

In the last instal­ment of this series dis­cussing MTTFD, I brought up the fact that every­thing fails even­tu­al­ly, and so every­thing has a nat­ur­al fail­ure rate. The bath­tub curve shown at the top of this post shows a typ­i­cal fail­ure rate curve for most prod­ucts. Fail­ure rates tell you the aver­age time (or some­times the mean time) it takes for com­po­nents or sys­tems to fail. Fail­ure rates are expressed in many ways, MTTFD and PFHd being the ways rel­e­vant to this dis­cus­sion of ISO 13849 analy­sis. MTTFis giv­en in years, and PFHd is giv­en in frac­tion­al hours (1/h). As a reminder, PFHd stands for “Prob­a­bil­i­ty of dan­ger­ous Fail­ure per Hour”.

Three of the stan­dard archi­tec­tures include auto­mat­ic diag­nos­tic func­tions, Cat­e­gories 2, 3 and 4. As soon as we add diag­nos­tics to the sys­tem, we need to know what faults the diag­nos­tics can detect and how many of the dan­ger­ous fail­ures rel­a­tive to the total num­ber of fail­ures that rep­re­sents. Diag­nos­tic Cov­er­age (DC) rep­re­sents the ratio of dan­ger­ous fail­ures that can be detect­ed to the total dan­ger­ous fail­ures that could occur, expressed as a per­cent­age. There will be some fail­ures that do not result in a dan­ger­ous fail­ure, and those fail­ures are exclud­ed from DC because we don’t need to wor­ry about them — if they occur, the sys­tem will not fail into a dan­ger­ous state.

Here’s the for­mal def­i­n­i­tion from [1]:

3.1.26 diag­nos­tic cov­er­age (DC)

mea­sure of the effec­tive­ness of diag­nos­tics, which may be deter­mined as the ratio between the fail­ure rate of detect­ed dan­ger­ous fail­ures and the fail­ure rate of total dan­ger­ous fail­ures

Note 1 to entry: Diag­nos­tic cov­er­age can exist for the whole or parts of a safe­ty-relat­ed sys­tem. For exam­ple, diag­nos­tic cov­er­age could exist for sen­sors and/or log­ic sys­tem and/or final ele­ments. [SOURCE: IEC 61508–4:1998, 3.8.6, mod­i­fied.]

That brings up two oth­er relat­ed def­i­n­i­tions that need to be kept in mind [1]:

3.1.4 fail­ure

ter­mi­na­tion of the abil­i­ty of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Fail­ure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The con­cept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Fail­ures which only affect the avail­abil­i­ty of the process under con­trol are out­side of the scope of this part of ISO 13849. [SOURCE: IEC 60050–191:1990, 04–01.]

and the most impor­tant one [1]:

3.1.5 dan­ger­ous fail­ure

fail­ure which has the poten­tial to put the SRP/CS in a haz­ardous or fail-to-func­tion state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem; in redun­dant sys­tems a dan­ger­ous hard­ware fail­ure is less like­ly to lead to the over­all dan­ger­ous or fail-to- func­tion state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, mod­i­fied.]

Just as a reminder, SRP/CS stands for “safe­ty-relat­ed parts of con­trol sys­tems”.

Failure Math

Failure Rate Data Sources

To do any cal­cu­la­tions, we need data, and this is true for fail­ure rates as well. ISO 13849–1 pro­vides some tables in the annex­es that list some com­mon types of com­po­nents and their asso­ci­at­ed fail­ure rates, and there are more fail­ure rate tables in ISO 13849–2. A word of cau­tion here: Do not mix sources of fail­ure rate data, as the con­di­tions under which that data is true won’t match the data in ISO 13849. There are a few good sources of fail­ure rate data out there, for exam­ple, MIL-HDBK-217, Reli­a­bil­i­ty Pre­dic­tion of Elec­tron­ic Equip­ment [15], as well as the data­base main­tained by Exi­da. In any case, use a sin­gle source for your fail­ure rate data.

Failure Rate Variables

IEC 61508 [7] defines a num­ber of vari­ables relat­ed to fail­ure rates. The low­er­case Greek let­ter lamb­da, \lambda, is used to denote fail­ures.

The com­mon vari­able des­ig­na­tions used are:

\lambda = fail­ures
\lambda_{(t)} = fail­ure rate
\lambda_s = “safe” fail­ures
\lambda_d = “dan­ger­ous” fail­ures
\lambda_{dd} = detectable “dan­ger­ous” fail­ures
\lambda_{du} = unde­tectable “dan­ger­ous” fail­ures

Calculating DC

Of these vari­ables, we only need to con­cern our­selves with \lambda_d, \lambda_{dd} and \lambda_{du}. To under­stand how these vari­ables are used, we can express their rela­tion­ship as

\lambda_d=\lambda_{dd}+\lambda_{du}

Fol­low­ing on that idea, the Diag­nos­tic Cov­er­age can be expressed as a per­cent­age like this:

DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100

Determining DC%

If you want to actu­al­ly cal­cu­late DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hard­core types to IEC 61508–2, Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems — Part 2: Require­ments for electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. This stan­dard goes into some depth on how to deter­mine fail­ure rates and how to cal­cu­late the “Safe Fail­ure Frac­tion,” a num­ber which is relat­ed to DC but is not the same.

For every­one else, the good news is that you can use the table in Annex E to esti­mate the DC%. It’s worth not­ing here that Annex E is “Infor­ma­tive.” In stan­dards-speak, this means that the infor­ma­tion in the annex is not part of the “nor­ma­tive” text, which means that it is sim­ply infor­ma­tion to help you use the nor­ma­tive part of the stan­dard. The design must con­form to the require­ments in the nor­ma­tive text if you want to claim con­for­mi­ty to the stan­dard. The fact that [1, Annex E] is infor­ma­tive gives you the option to cal­cu­late the DC% val­ue rather than select­ing it from Table E.1. Using the cal­cu­lat­ed val­ue would not vio­late the require­ments in the nor­ma­tive text.

If you are using IFA SISTEMA [16] to do the cal­cu­la­tions for you, you will find that the soft­ware lim­its you to select­ing a sin­gle DC mea­sure from Table E.1, and this prin­ci­ple applies if you are doing the cal­cu­la­tions by hand too. Only one item from Table E.1 can be select­ed for a giv­en safe­ty func­tion.

Ranking DC

Once you have deter­mined the DC for a safe­ty func­tion, you need to com­pare the DC val­ue against [1, Table 5] to see if the DC is suf­fi­cient for the PLr you are try­ing to achieve. Table 5 bins the DC results into four ranges. Just like bin­ning the PFHd val­ues into five ranges helps to pre­vent pre­ci­sion bias in esti­mat­ing the prob­a­bil­i­ty of fail­ure of the com­plete sys­tem or safe­ty func­tion, the ranges in Table 5 helps to pre­vent pre­ci­sion bias in the cal­cu­lat­ed or select­ed DC val­ues.

ISO 13849-1, Table 5 Diagnostic coverage (DC)
ISO 13849–1, Table 5 Diag­nos­tic cov­er­age (DC)

If the DC val­ue was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add addi­tion­al diag­nos­tic fea­tures so that you can either select a high­er cov­er­age from [1, Table E.1] or cal­cu­late a high­er val­ue using [14].

Multiple safety functions

When you have mul­ti­ple safe­ty func­tions that make up a com­plete safe­ty sys­tem, for exam­ple, an emer­gency stop func­tion and a guard inter­lock­ing func­tion, the DC val­ues need to be aver­aged to deter­mine the over­all DC for the com­plete sys­tem. [1, Annex E] pro­vides you with a method to do this in Equa­tion E.1.

Equation for averaging the DC values of multiple safety functions
ISO 13849–1-2015 Equa­tion E.1

Plug in the val­ues for MTTFD and DC for each safe­ty func­tion, and cal­cu­late the result­ing DCavg val­ue for the com­plete sys­tem.

That’s it for this arti­cle. The next part will cov­er Com­mon Cause Fail­ures (CCF). Look for it on 20-Mar-17!

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book, 3rd Ed. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. 7 parts. IEC Stan­dard 61508. Edi­tion 2. 2010.

[14]   Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems — Part 2: Require­ments for electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. IEC Stan­dard 61508–2. 2010.

[15]     Reli­a­bil­i­ty Pre­dic­tion of Elec­tron­ic Equip­ment. Mil­i­tary Hand­book MIL-HDBK-217F. 1991.

[16]     “IFA — Prac­ti­cal aids: Soft­ware-Assis­tent SISTEMA: Safe­ty Integri­ty — Soft­ware Tool for the Eval­u­a­tion of Machine Appli­ca­tions”, Dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].

Digiprove sealCopy­right secured by Digiprove © 2017
Acknowl­edge­ments: IEC and ISO as cit­ed
Some Rights Reserved

ISO 13849–1 Analysis — Part 4: MTTFD — Mean Time to Dangerous Failure

This entry is part 4 of 9 in the series How to do a 13849–1 analy­sis

Func­tion­al safe­ty is all about the like­li­hood of a safe­ty sys­tem fail­ing to oper­ate when you need it. Under­stand­ing Mean Time to Dan­ger­ous Fail­ure, or MTTFD, is crit­i­cal. If you have been read­ing about this top­ic at all, you may notice that I am abbre­vi­at­ing Mean Time to Dan­ger­ous Fail­ure with all cap­i­tal let­ters. Using MTTFD is a recent change that occurred in the third edi­tion of ISO 13849–1, pub­lished in 2015. In the first and sec­ond edi­tions, the cor­rect abbre­vi­a­tion was MTTFd. Onward!

If you missed the third instal­ment in this series, you can read it here.

Defining MTTFD

Let’s start by hav­ing a look at some key def­i­n­i­tions. Look­ing at [1, Cl. 3], you will find:

3.1.1 safety–related part of a con­trol sys­tem (SRP/CS)—part of a con­trol sys­tem that responds to safe­ty-relat­ed input sig­nals and gen­er­ates safe­ty-relat­ed
out­put sig­nals

Note 1 to entry: The com­bined safe­ty-relat­ed parts of a con­trol sys­tem start at the point where the safe­ty-relat­ed input sig­nals are ini­ti­at­ed (includ­ing, for exam­ple, the actu­at­ing cam and the roller of the posi­tion switch) and end at the out­put of the pow­er con­trol ele­ments (includ­ing, for exam­ple, the main con­tacts of a con­tac­tor)

Note 2 to entry: If mon­i­tor­ing sys­tems are used for diag­nos­tics, they are also con­sid­ered as SRP/CS.

3.1.5 dan­ger­ous fail­ure—fail­ure which has the poten­tial to put the SRP/CS in a haz­ardous or fail-to-func­tion state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem;
in redun­dant sys­tems a dan­ger­ous hard­ware fail­ure is less like­ly to lead to the over­all dan­ger­ous or fail-tofunc­tion
state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, mod­i­fied.]

3.1.25 mean time to dan­ger­ous fail­ure (MTTFD)—expec­ta­tion of the mean time to dan­ger­ous fail­ure

Def­i­n­i­tion 3.1.5 is pret­ty help­ful, but def­i­n­i­tion 3.1.25 is, well, not much of a def­i­n­i­tion. Let’s look at this anoth­er way.

Failures and Faults

Since every­thing can and will even­tu­al­ly fail to per­form the way we expect it to, we know that every­thing has a fail­ure rate because every­thing takes some time to fail. Grant­ed that this time may be very short, like the first time the unit is turned on, or it may be very long, some­times hun­dreds of years. Remem­ber that because this is a rate, it is some­thing that occurs over time. It is also impor­tant to be clear that we are talk­ing about fail­ures and not faults. Read­ing from [1]:

3.1.3 fault—state of an item char­ac­ter­ized by the inabil­i­ty to per­form a required func­tion, exclud­ing the inabil­i­ty dur­ing pre­ven­tive main­te­nance or oth­er planned actions, or due to lack of exter­nal resources

Note 1 to entry: A fault is often the result of a fail­ure of the item itself, but may exist with­out pri­or fail­ure.

Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault.
[SOURCE: IEC 60050?191:1990, 05–01.]

3.1.4 fail­ure— ter­mi­na­tion of the abil­i­ty of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Fail­ure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The con­cept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Fail­ures which only affect the avail­abil­i­ty of the process under con­trol are out­side of the scope of this part of ISO 13849.
[SOURCE: IEC 60050–191:1990, 04–01.]

3.1.4 Note 2 is the impor­tant one at this point in the dis­cus­sion.

Now, where we have mul­ti­ples of some­thing, like relays, valves, or safe­ty sys­tems, we now have a pop­u­la­tion of iden­ti­cal items, each of which will even­tu­al­ly fail at some point. We can count those fail­ures as they occur and tal­ly them up, and we can graph how many fail­ures we get in the pop­u­la­tion over time. If this is start­ing to sound sus­pi­cious­ly like sta­tis­tics to you, that is because it is.

OK, so let’s look at the kinds of fail­ures that occur in that pop­u­la­tion. Some fail­ures will result in a “safe” state, e.g., a relay fail­ing with all poles open, and some will fail in a poten­tial­ly “dan­ger­ous” state, like a nor­mal­ly closed valve devel­op­ing a sig­nif­i­cant leak. If we tal­ly up all the fail­ures that occur, and then tal­ly the num­ber of “safe” fail­ures and the num­ber of “dan­ger­ous” fail­ures in that pop­u­la­tion, we now have some very use­ful infor­ma­tion.

The dif­fer­ent kinds of fail­ures are sig­ni­fied using the low­er­case Greek let­ter \lambda (lamb­da). We can add some sub­scripts to help iden­ti­fy what kinds of fail­ures we are talk­ing about. The com­mon vari­able des­ig­na­tions used are [14]:

\lambda = fail­ures
\lambda_{(t)} = fail­ure rate
\lambda_s = “safe” fail­ures
\lambda_d = “dan­ger­ous” fail­ures
\lambda_{dd} = detectable “dan­ger­ous” fail­ures
\lambda_{du} = unde­tectable “dan­ger­ous” fail­ures

I will be dis­cussing some of these vari­ables in more detail in a lat­er part of the series when I delve into Diag­nos­tic Cov­er­age, so don’t wor­ry about them too much just yet.

Getting to MTTFD

Since we can now start to deal with the fail­ure rate data math­e­mat­i­cal­ly, we can start to do some cal­cu­la­tions about expect­ed life­time of a com­po­nent or a sys­tem. That expect­ed, or prob­a­ble, life­time is what def­i­n­i­tion 3.1.25 was on about, and is what we call MTTFD.

MTTFD is the time in years over which the prob­a­bil­i­ty of fail­ure is rel­a­tive­ly con­stant. If you look at a typ­i­cal fail­ure rate curve, called a “bath­tub curve” due to its resem­blance to the pro­file of a nice soak­er tub, the MTTFD is the flat­ter por­tion of the curve between the end of the infant mor­tal­i­ty peri­od and the wear-out peri­od at the end of life. This part of the curve is the por­tion assumed to be includ­ed in the “mis­sion time” for the prod­uct. ISO 13849–1 assumes the mis­sion time for all machin­ery is 20 years [1, 4.5.4] and [1, Cl. 10].

Diagram of a standardized bathtub-shaped failure rate curve.
Fig­ure 1 — Typ­i­cal Bath­tub Curve [15]
ISO 13849–1 pro­vides us with guid­ance on how MTTFD relates to the deter­mi­na­tion of the PL in [1, Cl. 4.5.2]. MTTFD is fur­ther grouped into three bands as shown in [1, Table 4].
Table showing the bands of Mean time to dangerous failure of each channel (MTTFD)

The notes for this table are impor­tant as well. Since you can’t read the notes par­tic­u­lar­ly well in the table above, I’ve repro­duced them here:

NOTE 1 The choice of the MTTFD ranges of each chan­nel is based on fail­ure rates found in the field as state-of-the-art, form­ing a kind of log­a­rith­mic scale fit­ting to the log­a­rith­mic PL scale. An MTTFD val­ue of each chan­nel less than three years is not expect­ed to be found for real SRP/CS since this would mean that after one year about 30 % of all sys­tems on the mar­ket will fail and will need to be replaced. An MTTFD val­ue of each chan­nel greater than 100 years is not accept­able because SRP/CS for high risks should not depend on the reli­a­bil­i­ty of com­po­nents alone. To rein­force the SRP/CS against sys­tem­at­ic and ran­dom fail­ure, addi­tion­al means such as redun­dan­cy and test­ing should be required. To be prac­ti­ca­ble, the num­ber of ranges was restrict­ed to three. The lim­i­ta­tion of MTTFD of each chan­nel val­ues to a max­i­mum of 100 years refers to the sin­gle chan­nel of the SRP/CS which car­ries out the safe­ty func­tion. High­er MTTFD val­ues can be used for sin­gle com­po­nents (see Table D.1).

NOTE 2 The indi­cat­ed bor­ders of this table are assumed with­in an accu­ra­cy of 5%.

The stan­dard then tells us to select the MTTFD using a sim­ple hier­ar­chy [1, 4.5.2]:

For the esti­ma­tion ofMT­TFD of a com­po­nent, the hier­ar­chi­cal pro­ce­dure for find­ing data shall be, in the order giv­en:

a) use manufacturer’s data;
b) use meth­ods in Annex C and Annex D;
c) choose 10 years.

Why ten years? Ten years is half of the assumed mis­sion life­time of 20 years. More on mis­sion life­time in a lat­er post.

Look­ing at [1, Annex C.2], you will find the “Good Engi­neer­ing Prac­tices” method for esti­mat­ing MTTFD, pre­sum­ing the man­u­fac­tur­er has not pro­vid­ed you with that infor­ma­tion. ISO 13849–2 [2] has some ref­er­ence tables that pro­vide some gen­er­al MTTFD val­ues for some kinds of com­po­nents, but not every part that exists can be list­ed. How can we deal with parts not list­ed? [1, Annex C.4] pro­vides us with a cal­cu­la­tion method for esti­mat­ing MTTFD for pneu­mat­ic, mechan­i­cal and electro­mechan­i­cal com­po­nents.

Calculating MTTFD for pneumatic, mechanical and electromechanical components

I need to intro­duce you to a few more vari­ables before we look at how to cal­cu­late MTTFD for a com­po­nent.

Vari­ables
Vari­able Descrip­tion
B10 Num­ber of cycles until 10% of the com­po­nents fail (for pneu­mat­ic and electro­mechan­i­cal com­po­nents)
B10D Num­ber of cycles until 10% of the com­po­nents fail dan­ger­ous­ly (for pneu­mat­ic and electro­mechan­i­cal com­po­nents)
T life­time of the com­po­nent
T10D the mean time until 10% of the com­po­nents fail dan­ger­ous­ly
hop is the mean oper­a­tion time, in hours per day;
dop is the mean oper­a­tion time, in days per year;
tcycle is the mean oper­a­tion time between the begin­ning of two suc­ces­sive cycles of the com­po­nent. (e.g., switch­ing of a valve) in sec­onds per cycle.
s sec­onds
h hours
a years

Know­ing a few details we can cal­cu­late the MTTFD using [1, Eqn C.1]. We need to know the fol­low­ing para­me­ters for the appli­ca­tion:

  • B10D
  • hop
  • dop
  • tcycle

Formula for calculating MTTFD - ISO 13849-1, Equation C.1
Cal­cu­lat­ing MTTFD — [1, Eqn. C.1]
In order to use [1, Eqn. C.1], we need to first cal­cu­late nop, using [1, Eqn. C.2]:

Formula for calculating nop - ISO 13849-1, Equation C.2.
Cal­cu­lat­ing nop — [1, Eqn. C.2]
We may also need one more cal­cu­la­tion, [1, Eqn. C.4]:
Calculating T10D using ISO 13849-1 Eqn. C.3
Cal­cu­lat­ing T10D — [1, Eqn. C.4]

Example Calculation [1, C.4.3]

For a pneu­mat­ic valve, a man­u­fac­tur­er deter­mines a mean val­ue of 60 mil­lion cycles as B10D. The valve is used for two shifts each day on 220 oper­a­tion days a year. The mean time between the begin­ning of two suc­ces­sive switch­ing of the valve is esti­mat­ed as 5 s. This yields the fol­low­ing val­ues:

  • dop of 220 days per year;
  • hop of 16 h per day;
  • tcycle of 5 s per cycle;
  • B10D of 60 mil­lion cycles.

Doing the math, we get:

Example C.4.3 calculations from, ISO 13849-1.
Exam­ple C.4.3

So there you have it, at least for a fair­ly sim­ple case. There are more exam­ples in ISO 13849–1, and I would encour­age you to work through them. You can also find a wealth of exam­ples in a report pro­duced by the BGIA in Ger­many, called the Func­tion­al safe­ty of machine con­trols (BGIA Report 2/2008e) [16]. The down­load for the report is linked from the ref­er­ence list at the end of this arti­cle. If you are a SISTEMA user, there are lots of exam­ples in the SISTEMA Cook­books, and there are exam­ple files avail­able so that you can see how to assem­ble the sys­tems in the soft­ware.

The next part of this series cov­ers Diag­nos­tic Cov­er­age (DC), and the aver­age DC for mul­ti­ple safe­ty func­tions in a sys­tem, DCavg.

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[2]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 2: Val­i­da­tion. 2nd Edi­tion. ISO Stan­dard 13849–2. 2012.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. 7 parts. IEC Stan­dard 61508. Sec­ond Edi­tion. 2010.

[14]    Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems – Part 4: Def­i­n­i­tions and abbre­vi­a­tions. IEC Stan­dard 61508–4. Sec­ond Edi­tion. 2010.

[15]    “The bath­tub curve and prod­uct fail­ure behav­ior part 1 of 2”, Findchart.co, 2017. [Online]. Avail­able: http://findchart.co/download.php?aHR0cDovL3d3dy53ZWlidWxsLmNvbS9ob3R3aXJlL2lzc3VlMjEvaHQyMV8xLmdpZg. [Accessed: 03- Jan- 2017].

[16]   “Func­tion­al safe­ty of machine con­trols — Appli­ca­tion of EN ISO 13849 (BGIA Report 2/2008e)”, dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/publikationen/reports-download/bgia-reports-2007-bis-2008/bgia-report-2–2008/index-2.jsp. [Accessed: 2017-01-04].

Digiprove sealCopy­right secured by Digiprove © 2017
Acknowl­edge­ments: IEC, ISO and oth­ers as cit­ed
Some Rights Reserved