ISO 13849–1 Analysis — Part 4: MTTFD — Mean Time to Dangerous Failure

This entry is part 4 of 9 in the series How to do a 13849–1 analy­sis

Func­tion­al safe­ty is all about the like­li­hood of a safe­ty sys­tem fail­ing to oper­ate when you need it. Under­stand­ing Mean Time to Dan­ger­ous Fail­ure, or MTTFD, is crit­i­cal. If you have been read­ing about this top­ic at all, you may notice that I am abbre­vi­at­ing Mean Time to Dan­ger­ous Fail­ure with all cap­i­tal let­ters. Using MTTFD is a recent change that occurred in the third edi­tion of ISO 13849–1, pub­lished in 2015. In the first and sec­ond edi­tions, the cor­rect abbre­vi­a­tion was MTTFd. Onward!

If you missed the third instal­ment in this series, you can read it here.

Defining MTTFD

Let’s start by hav­ing a look at some key def­i­n­i­tions. Look­ing at [1, Cl. 3], you will find:

3.1.1 safety–related part of a con­trol sys­tem (SRP/CS)—part of a con­trol sys­tem that responds to safe­ty-relat­ed input sig­nals and gen­er­ates safe­ty-relat­ed
out­put sig­nals

Note 1 to entry: The com­bined safe­ty-relat­ed parts of a con­trol sys­tem start at the point where the safe­ty-relat­ed input sig­nals are ini­ti­at­ed (includ­ing, for exam­ple, the actu­at­ing cam and the roller of the posi­tion switch) and end at the out­put of the pow­er con­trol ele­ments (includ­ing, for exam­ple, the main con­tacts of a con­tac­tor)

Note 2 to entry: If mon­i­tor­ing sys­tems are used for diag­nos­tics, they are also con­sid­ered as SRP/CS.

3.1.5 dan­ger­ous fail­ure—fail­ure which has the poten­tial to put the SRP/CS in a haz­ardous or fail-to-func­tion state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem;
in redun­dant sys­tems a dan­ger­ous hard­ware fail­ure is less like­ly to lead to the over­all dan­ger­ous or fail-tofunc­tion
state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, mod­i­fied.]

3.1.25 mean time to dan­ger­ous fail­ure (MTTFD)—expec­ta­tion of the mean time to dan­ger­ous fail­ure

Def­i­n­i­tion 3.1.5 is pret­ty help­ful, but def­i­n­i­tion 3.1.25 is, well, not much of a def­i­n­i­tion. Let’s look at this anoth­er way.

Failures and Faults

Since every­thing can and will even­tu­al­ly fail to per­form the way we expect it to, we know that every­thing has a fail­ure rate because every­thing takes some time to fail. Grant­ed that this time may be very short, like the first time the unit is turned on, or it may be very long, some­times hun­dreds of years. Remem­ber that because this is a rate, it is some­thing that occurs over time. It is also impor­tant to be clear that we are talk­ing about fail­ures and not faults. Read­ing from [1]:

3.1.3 fault—state of an item char­ac­ter­ized by the inabil­i­ty to per­form a required func­tion, exclud­ing the inabil­i­ty dur­ing pre­ven­tive main­te­nance or oth­er planned actions, or due to lack of exter­nal resources

Note 1 to entry: A fault is often the result of a fail­ure of the item itself, but may exist with­out pri­or fail­ure.

Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault.
[SOURCE: IEC 60050?191:1990, 05–01.]

3.1.4 fail­ure— ter­mi­na­tion of the abil­i­ty of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Fail­ure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The con­cept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Fail­ures which only affect the avail­abil­i­ty of the process under con­trol are out­side of the scope of this part of ISO 13849.
[SOURCE: IEC 60050–191:1990, 04–01.]

3.1.4 Note 2 is the impor­tant one at this point in the dis­cus­sion.

Now, where we have mul­ti­ples of some­thing, like relays, valves, or safe­ty sys­tems, we now have a pop­u­la­tion of iden­ti­cal items, each of which will even­tu­al­ly fail at some point. We can count those fail­ures as they occur and tal­ly them up, and we can graph how many fail­ures we get in the pop­u­la­tion over time. If this is start­ing to sound sus­pi­cious­ly like sta­tis­tics to you, that is because it is.

OK, so let’s look at the kinds of fail­ures that occur in that pop­u­la­tion. Some fail­ures will result in a “safe” state, e.g., a relay fail­ing with all poles open, and some will fail in a poten­tial­ly “dan­ger­ous” state, like a nor­mal­ly closed valve devel­op­ing a sig­nif­i­cant leak. If we tal­ly up all the fail­ures that occur, and then tal­ly the num­ber of “safe” fail­ures and the num­ber of “dan­ger­ous” fail­ures in that pop­u­la­tion, we now have some very use­ful infor­ma­tion.

The dif­fer­ent kinds of fail­ures are sig­ni­fied using the low­er­case Greek let­ter \lambda (lamb­da). We can add some sub­scripts to help iden­ti­fy what kinds of fail­ures we are talk­ing about. The com­mon vari­able des­ig­na­tions used are [14]:

\lambda = fail­ures
\lambda_{(t)} = fail­ure rate
\lambda_s = “safe” fail­ures
\lambda_d = “dan­ger­ous” fail­ures
\lambda_{dd} = detectable “dan­ger­ous” fail­ures
\lambda_{du} = unde­tectable “dan­ger­ous” fail­ures

I will be dis­cussing some of these vari­ables in more detail in a lat­er part of the series when I delve into Diag­nos­tic Cov­er­age, so don’t wor­ry about them too much just yet.

Getting to MTTFD

Since we can now start to deal with the fail­ure rate data math­e­mat­i­cal­ly, we can start to do some cal­cu­la­tions about expect­ed life­time of a com­po­nent or a sys­tem. That expect­ed, or prob­a­ble, life­time is what def­i­n­i­tion 3.1.25 was on about, and is what we call MTTFD.

MTTFD is the time in years over which the prob­a­bil­i­ty of fail­ure is rel­a­tive­ly con­stant. If you look at a typ­i­cal fail­ure rate curve, called a “bath­tub curve” due to its resem­blance to the pro­file of a nice soak­er tub, the MTTFD is the flat­ter por­tion of the curve between the end of the infant mor­tal­i­ty peri­od and the wear-out peri­od at the end of life. This part of the curve is the por­tion assumed to be includ­ed in the “mis­sion time” for the prod­uct. ISO 13849–1 assumes the mis­sion time for all machin­ery is 20 years [1, 4.5.4] and [1, Cl. 10].

Diagram of a standardized bathtub-shaped failure rate curve.
Fig­ure 1 — Typ­i­cal Bath­tub Curve [15]
ISO 13849–1 pro­vides us with guid­ance on how MTTFD relates to the deter­mi­na­tion of the PL in [1, Cl. 4.5.2]. MTTFD is fur­ther grouped into three bands as shown in [1, Table 4].
Table showing the bands of Mean time to dangerous failure of each channel (MTTFD)

The notes for this table are impor­tant as well. Since you can’t read the notes par­tic­u­lar­ly well in the table above, I’ve repro­duced them here:

NOTE 1 The choice of the MTTFD ranges of each chan­nel is based on fail­ure rates found in the field as state-of-the-art, form­ing a kind of log­a­rith­mic scale fit­ting to the log­a­rith­mic PL scale. An MTTFD val­ue of each chan­nel less than three years is not expect­ed to be found for real SRP/CS since this would mean that after one year about 30 % of all sys­tems on the mar­ket will fail and will need to be replaced. An MTTFD val­ue of each chan­nel greater than 100 years is not accept­able because SRP/CS for high risks should not depend on the reli­a­bil­i­ty of com­po­nents alone. To rein­force the SRP/CS against sys­tem­at­ic and ran­dom fail­ure, addi­tion­al means such as redun­dan­cy and test­ing should be required. To be prac­ti­ca­ble, the num­ber of ranges was restrict­ed to three. The lim­i­ta­tion of MTTFD of each chan­nel val­ues to a max­i­mum of 100 years refers to the sin­gle chan­nel of the SRP/CS which car­ries out the safe­ty func­tion. High­er MTTFD val­ues can be used for sin­gle com­po­nents (see Table D.1).

NOTE 2 The indi­cat­ed bor­ders of this table are assumed with­in an accu­ra­cy of 5%.

The stan­dard then tells us to select the MTTFD using a sim­ple hier­ar­chy [1, 4.5.2]:

For the esti­ma­tion ofMT­TFD of a com­po­nent, the hier­ar­chi­cal pro­ce­dure for find­ing data shall be, in the order giv­en:

a) use manufacturer’s data;
b) use meth­ods in Annex C and Annex D;
c) choose 10 years.

Why ten years? Ten years is half of the assumed mis­sion life­time of 20 years. More on mis­sion life­time in a lat­er post.

Look­ing at [1, Annex C.2], you will find the “Good Engi­neer­ing Prac­tices” method for esti­mat­ing MTTFD, pre­sum­ing the man­u­fac­tur­er has not pro­vid­ed you with that infor­ma­tion. ISO 13849–2 [2] has some ref­er­ence tables that pro­vide some gen­er­al MTTFD val­ues for some kinds of com­po­nents, but not every part that exists can be list­ed. How can we deal with parts not list­ed? [1, Annex C.4] pro­vides us with a cal­cu­la­tion method for esti­mat­ing MTTFD for pneu­mat­ic, mechan­i­cal and electro­mechan­i­cal com­po­nents.

Calculating MTTFD for pneumatic, mechanical and electromechanical components

I need to intro­duce you to a few more vari­ables before we look at how to cal­cu­late MTTFD for a com­po­nent.

Vari­ables
Vari­able Descrip­tion
B10 Num­ber of cycles until 10% of the com­po­nents fail (for pneu­mat­ic and electro­mechan­i­cal com­po­nents)
B10D Num­ber of cycles until 10% of the com­po­nents fail dan­ger­ous­ly (for pneu­mat­ic and electro­mechan­i­cal com­po­nents)
T life­time of the com­po­nent
T10D the mean time until 10% of the com­po­nents fail dan­ger­ous­ly
hop is the mean oper­a­tion time, in hours per day;
dop is the mean oper­a­tion time, in days per year;
tcycle is the mean oper­a­tion time between the begin­ning of two suc­ces­sive cycles of the com­po­nent. (e.g., switch­ing of a valve) in sec­onds per cycle.
s sec­onds
h hours
a years

Know­ing a few details we can cal­cu­late the MTTFD using [1, Eqn C.1]. We need to know the fol­low­ing para­me­ters for the appli­ca­tion:

  • B10D
  • hop
  • dop
  • tcycle

Formula for calculating MTTFD - ISO 13849-1, Equation C.1
Cal­cu­lat­ing MTTFD — [1, Eqn. C.1]
In order to use [1, Eqn. C.1], we need to first cal­cu­late nop, using [1, Eqn. C.2]:

Formula for calculating nop - ISO 13849-1, Equation C.2.
Cal­cu­lat­ing nop — [1, Eqn. C.2]
We may also need one more cal­cu­la­tion, [1, Eqn. C.4]:
Calculating T10D using ISO 13849-1 Eqn. C.3
Cal­cu­lat­ing T10D — [1, Eqn. C.4]

Example Calculation [1, C.4.3]

For a pneu­mat­ic valve, a man­u­fac­tur­er deter­mines a mean val­ue of 60 mil­lion cycles as B10D. The valve is used for two shifts each day on 220 oper­a­tion days a year. The mean time between the begin­ning of two suc­ces­sive switch­ing of the valve is esti­mat­ed as 5 s. This yields the fol­low­ing val­ues:

  • dop of 220 days per year;
  • hop of 16 h per day;
  • tcycle of 5 s per cycle;
  • B10D of 60 mil­lion cycles.

Doing the math, we get:

Example C.4.3 calculations from, ISO 13849-1.
Exam­ple C.4.3

So there you have it, at least for a fair­ly sim­ple case. There are more exam­ples in ISO 13849–1, and I would encour­age you to work through them. You can also find a wealth of exam­ples in a report pro­duced by the BGIA in Ger­many, called the Func­tion­al safe­ty of machine con­trols (BGIA Report 2/2008e) [16]. The down­load for the report is linked from the ref­er­ence list at the end of this arti­cle. If you are a SISTEMA user, there are lots of exam­ples in the SISTEMA Cook­books, and there are exam­ple files avail­able so that you can see how to assem­ble the sys­tems in the soft­ware.

The next part of this series cov­ers Diag­nos­tic Cov­er­age (DC), and the aver­age DC for mul­ti­ple safe­ty func­tions in a sys­tem, DCavg.

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[2]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 2: Val­i­da­tion. 2nd Edi­tion. ISO Stan­dard 13849–2. 2012.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. 7 parts. IEC Stan­dard 61508. Sec­ond Edi­tion. 2010.

[14]    Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems – Part 4: Def­i­n­i­tions and abbre­vi­a­tions. IEC Stan­dard 61508–4. Sec­ond Edi­tion. 2010.

[15]    “The bath­tub curve and prod­uct fail­ure behav­ior part 1 of 2”, Findchart.co, 2017. [Online]. Avail­able: http://findchart.co/download.php?aHR0cDovL3d3dy53ZWlidWxsLmNvbS9ob3R3aXJlL2lzc3VlMjEvaHQyMV8xLmdpZg. [Accessed: 03- Jan- 2017].

[16]   “Func­tion­al safe­ty of machine con­trols — Appli­ca­tion of EN ISO 13849 (BGIA Report 2/2008e)”, dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/publikationen/reports-download/bgia-reports-2007-bis-2008/bgia-report-2–2008/index-2.jsp. [Accessed: 2017-01-04].

Digiprove sealCopy­right secured by Digiprove © 2017
Acknowl­edge­ments: IEC, ISO and oth­ers as cit­ed
Some Rights Reserved
Series Nav­i­ga­tionISO 13849–1 Analy­sis — Part 3: Archi­tec­tur­al Cat­e­go­ry Selec­tion”>ISO 13849–1 Analy­sis — Part 3: Archi­tec­tur­al Cat­e­go­ry Selec­tionISO 13849–1 Analy­sis — Part 5: Diag­nos­tic Cov­er­age (DC)”>ISO 13849–1 Analy­sis — Part 5: Diag­nos­tic Cov­er­age (DC)

Author: Doug Nix

+DougNix is Managing Director and Principal Consultant at Compliance InSight Consulting, Inc. (http://www.complianceinsight.ca) in Kitchener, Ontario, and is Lead Author and Managing Editor of the Machinery Safety 101 blog. Doug's work includes teaching machinery risk assessment techniques privately and through Conestoga College Institute of Technology and Advanced Learning in Kitchener, Ontario, as well as providing technical services and training programs to clients related to risk assessment, industrial machinery safety, safety-related control system integration and reliability, laser safety and regulatory conformity. Follow me on Academia.edu//a.academia-assets.com/javascripts/social.js