ISO 13849–1 Analysis — Part 4: MTTFD — Mean Time to Dangerous Failure

This entry is part 4 of 9 in the series How to do a 13849–1 analy­sis

Func­tion­al safe­ty is all about the like­li­hood of a safe­ty sys­tem fail­ing to oper­ate when you need it. Under­stand­ing Mean Time to Dan­ger­ous Fail­ure, or MTTFD, is crit­i­cal. If you have been read­ing about this top­ic at all, you may notice that I am abbre­vi­at­ing Mean Time to Dan­ger­ous Fail­ure with all cap­i­tal let­ters. Using MTTFD is a recent change that occurred in the third edi­tion of ISO 13849–1, pub­lished in 2015. In the first and sec­ond edi­tions, the cor­rect abbre­vi­a­tion was MTTFd. Onward!

If you missed the third instal­ment in this series, you can read it here.

Defining MTTFD

Let’s start by hav­ing a look at some key def­i­n­i­tions. Look­ing at [1, Cl. 3], you will find:

3.1.1 safety–related part of a con­trol sys­tem (SRP/CS)—part of a con­trol sys­tem that responds to safe­ty-relat­ed input sig­nals and gen­er­ates safe­ty-relat­ed
out­put sig­nals

Note 1 to entry: The com­bined safe­ty-relat­ed parts of a con­trol sys­tem start at the point where the safe­ty-relat­ed input sig­nals are ini­ti­at­ed (includ­ing, for exam­ple, the actu­at­ing cam and the roller of the posi­tion switch) and end at the out­put of the pow­er con­trol ele­ments (includ­ing, for exam­ple, the main con­tacts of a con­tac­tor)

Note 2 to entry: If mon­i­tor­ing sys­tems are used for diag­nos­tics, they are also con­sid­ered as SRP/CS.

3.1.5 dan­ger­ous fail­ure—fail­ure which has the poten­tial to put the SRP/CS in a haz­ardous or fail-to-func­tion state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem;
in redun­dant sys­tems a dan­ger­ous hard­ware fail­ure is less like­ly to lead to the over­all dan­ger­ous or fail-tofunc­tion
state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, mod­i­fied.]

3.1.25 mean time to dan­ger­ous fail­ure (MTTFD)—expec­ta­tion of the mean time to dan­ger­ous fail­ure

Def­i­n­i­tion 3.1.5 is pret­ty help­ful, but def­i­n­i­tion 3.1.25 is, well, not much of a def­i­n­i­tion. Let’s look at this anoth­er way.

Failures and Faults

Since every­thing can and will even­tu­al­ly fail to per­form the way we expect it to, we know that every­thing has a fail­ure rate because every­thing takes some time to fail. Grant­ed that this time may be very short, like the first time the unit is turned on, or it may be very long, some­times hun­dreds of years. Remem­ber that because this is a rate, it is some­thing that occurs over time. It is also impor­tant to be clear that we are talk­ing about fail­ures and not faults. Read­ing from [1]:

3.1.3 fault—state of an item char­ac­ter­ized by the inabil­i­ty to per­form a required func­tion, exclud­ing the inabil­i­ty dur­ing pre­ven­tive main­te­nance or oth­er planned actions, or due to lack of exter­nal resources

Note 1 to entry: A fault is often the result of a fail­ure of the item itself, but may exist with­out pri­or fail­ure.

Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault.
[SOURCE: IEC 60050?191:1990, 05–01.]

3.1.4 fail­ure— ter­mi­na­tion of the abil­i­ty of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Fail­ure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The con­cept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Fail­ures which only affect the avail­abil­i­ty of the process under con­trol are out­side of the scope of this part of ISO 13849.
[SOURCE: IEC 60050–191:1990, 04–01.]

3.1.4 Note 2 is the impor­tant one at this point in the dis­cus­sion.

Now, where we have mul­ti­ples of some­thing, like relays, valves, or safe­ty sys­tems, we now have a pop­u­la­tion of iden­ti­cal items, each of which will even­tu­al­ly fail at some point. We can count those fail­ures as they occur and tal­ly them up, and we can graph how many fail­ures we get in the pop­u­la­tion over time. If this is start­ing to sound sus­pi­cious­ly like sta­tis­tics to you, that is because it is.

OK, so let’s look at the kinds of fail­ures that occur in that pop­u­la­tion. Some fail­ures will result in a “safe” state, e.g., a relay fail­ing with all poles open, and some will fail in a poten­tial­ly “dan­ger­ous” state, like a nor­mal­ly closed valve devel­op­ing a sig­nif­i­cant leak. If we tal­ly up all the fail­ures that occur, and then tal­ly the num­ber of “safe” fail­ures and the num­ber of “dan­ger­ous” fail­ures in that pop­u­la­tion, we now have some very use­ful infor­ma­tion.

The dif­fer­ent kinds of fail­ures are sig­ni­fied using the low­er­case Greek let­ter \lambda (lamb­da). We can add some sub­scripts to help iden­ti­fy what kinds of fail­ures we are talk­ing about. The com­mon vari­able des­ig­na­tions used are [14]:

\lambda = fail­ures
\lambda_{(t)} = fail­ure rate
\lambda_s = “safe” fail­ures
\lambda_d = “dan­ger­ous” fail­ures
\lambda_{dd} = detectable “dan­ger­ous” fail­ures
\lambda_{du} = unde­tectable “dan­ger­ous” fail­ures

I will be dis­cussing some of these vari­ables in more detail in a lat­er part of the series when I delve into Diag­nos­tic Cov­er­age, so don’t wor­ry about them too much just yet.

Getting to MTTFD

Since we can now start to deal with the fail­ure rate data math­e­mat­i­cal­ly, we can start to do some cal­cu­la­tions about expect­ed life­time of a com­po­nent or a sys­tem. That expect­ed, or prob­a­ble, life­time is what def­i­n­i­tion 3.1.25 was on about, and is what we call MTTFD.

MTTFD is the time in years over which the prob­a­bil­i­ty of fail­ure is rel­a­tive­ly con­stant. If you look at a typ­i­cal fail­ure rate curve, called a “bath­tub curve” due to its resem­blance to the pro­file of a nice soak­er tub, the MTTFD is the flat­ter por­tion of the curve between the end of the infant mor­tal­i­ty peri­od and the wear-out peri­od at the end of life. This part of the curve is the por­tion assumed to be includ­ed in the “mis­sion time” for the prod­uct. ISO 13849–1 assumes the mis­sion time for all machin­ery is 20 years [1, 4.5.4] and [1, Cl. 10].

Diagram of a standardized bathtub-shaped failure rate curve.
Fig­ure 1 — Typ­i­cal Bath­tub Curve [15]
ISO 13849–1 pro­vides us with guid­ance on how MTTFD relates to the deter­mi­na­tion of the PL in [1, Cl. 4.5.2]. MTTFD is fur­ther grouped into three bands as shown in [1, Table 4].
Table showing the bands of Mean time to dangerous failure of each channel (MTTFD)

The notes for this table are impor­tant as well. Since you can’t read the notes par­tic­u­lar­ly well in the table above, I’ve repro­duced them here:

NOTE 1 The choice of the MTTFD ranges of each chan­nel is based on fail­ure rates found in the field as state-of-the-art, form­ing a kind of log­a­rith­mic scale fit­ting to the log­a­rith­mic PL scale. An MTTFD val­ue of each chan­nel less than three years is not expect­ed to be found for real SRP/CS since this would mean that after one year about 30 % of all sys­tems on the mar­ket will fail and will need to be replaced. An MTTFD val­ue of each chan­nel greater than 100 years is not accept­able because SRP/CS for high risks should not depend on the reli­a­bil­i­ty of com­po­nents alone. To rein­force the SRP/CS against sys­tem­at­ic and ran­dom fail­ure, addi­tion­al means such as redun­dan­cy and test­ing should be required. To be prac­ti­ca­ble, the num­ber of ranges was restrict­ed to three. The lim­i­ta­tion of MTTFD of each chan­nel val­ues to a max­i­mum of 100 years refers to the sin­gle chan­nel of the SRP/CS which car­ries out the safe­ty func­tion. High­er MTTFD val­ues can be used for sin­gle com­po­nents (see Table D.1).

NOTE 2 The indi­cat­ed bor­ders of this table are assumed with­in an accu­ra­cy of 5%.

The stan­dard then tells us to select the MTTFD using a sim­ple hier­ar­chy [1, 4.5.2]:

For the esti­ma­tion ofMT­TFD of a com­po­nent, the hier­ar­chi­cal pro­ce­dure for find­ing data shall be, in the order giv­en:

a) use manufacturer’s data;
b) use meth­ods in Annex C and Annex D;
c) choose 10 years.

Why ten years? Ten years is half of the assumed mis­sion life­time of 20 years. More on mis­sion life­time in a lat­er post.

Look­ing at [1, Annex C.2], you will find the “Good Engi­neer­ing Prac­tices” method for esti­mat­ing MTTFD, pre­sum­ing the man­u­fac­tur­er has not pro­vid­ed you with that infor­ma­tion. ISO 13849–2 [2] has some ref­er­ence tables that pro­vide some gen­er­al MTTFD val­ues for some kinds of com­po­nents, but not every part that exists can be list­ed. How can we deal with parts not list­ed? [1, Annex C.4] pro­vides us with a cal­cu­la­tion method for esti­mat­ing MTTFD for pneu­mat­ic, mechan­i­cal and electro­mechan­i­cal com­po­nents.

Calculating MTTFD for pneumatic, mechanical and electromechanical components

I need to intro­duce you to a few more vari­ables before we look at how to cal­cu­late MTTFD for a com­po­nent.

Vari­ables
Vari­able Descrip­tion
B10 Num­ber of cycles until 10% of the com­po­nents fail (for pneu­mat­ic and electro­mechan­i­cal com­po­nents)
B10D Num­ber of cycles until 10% of the com­po­nents fail dan­ger­ous­ly (for pneu­mat­ic and electro­mechan­i­cal com­po­nents)
T life­time of the com­po­nent
T10D the mean time until 10% of the com­po­nents fail dan­ger­ous­ly
hop is the mean oper­a­tion time, in hours per day;
dop is the mean oper­a­tion time, in days per year;
tcycle is the mean oper­a­tion time between the begin­ning of two suc­ces­sive cycles of the com­po­nent. (e.g., switch­ing of a valve) in sec­onds per cycle.
s sec­onds
h hours
a years

Know­ing a few details we can cal­cu­late the MTTFD using [1, Eqn C.1]. We need to know the fol­low­ing para­me­ters for the appli­ca­tion:

  • B10D
  • hop
  • dop
  • tcycle

Formula for calculating MTTFD - ISO 13849-1, Equation C.1
Cal­cu­lat­ing MTTFD — [1, Eqn. C.1]
In order to use [1, Eqn. C.1], we need to first cal­cu­late nop, using [1, Eqn. C.2]:

Formula for calculating nop - ISO 13849-1, Equation C.2.
Cal­cu­lat­ing nop — [1, Eqn. C.2]
We may also need one more cal­cu­la­tion, [1, Eqn. C.4]:
Calculating T10D using ISO 13849-1 Eqn. C.3
Cal­cu­lat­ing T10D — [1, Eqn. C.4]

Example Calculation [1, C.4.3]

For a pneu­mat­ic valve, a man­u­fac­tur­er deter­mines a mean val­ue of 60 mil­lion cycles as B10D. The valve is used for two shifts each day on 220 oper­a­tion days a year. The mean time between the begin­ning of two suc­ces­sive switch­ing of the valve is esti­mat­ed as 5 s. This yields the fol­low­ing val­ues:

  • dop of 220 days per year;
  • hop of 16 h per day;
  • tcycle of 5 s per cycle;
  • B10D of 60 mil­lion cycles.

Doing the math, we get:

Example C.4.3 calculations from, ISO 13849-1.
Exam­ple C.4.3

So there you have it, at least for a fair­ly sim­ple case. There are more exam­ples in ISO 13849–1, and I would encour­age you to work through them. You can also find a wealth of exam­ples in a report pro­duced by the BGIA in Ger­many, called the Func­tion­al safe­ty of machine con­trols (BGIA Report 2/2008e) [16]. The down­load for the report is linked from the ref­er­ence list at the end of this arti­cle. If you are a SISTEMA user, there are lots of exam­ples in the SISTEMA Cook­books, and there are exam­ple files avail­able so that you can see how to assem­ble the sys­tems in the soft­ware.

The next part of this series cov­ers Diag­nos­tic Cov­er­age (DC), and the aver­age DC for mul­ti­ple safe­ty func­tions in a sys­tem, DCavg.

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[2]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 2: Val­i­da­tion. 2nd Edi­tion. ISO Stan­dard 13849–2. 2012.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. 7 parts. IEC Stan­dard 61508. Sec­ond Edi­tion. 2010.

[14]    Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems – Part 4: Def­i­n­i­tions and abbre­vi­a­tions. IEC Stan­dard 61508–4. Sec­ond Edi­tion. 2010.

[15]    “The bath­tub curve and prod­uct fail­ure behav­ior part 1 of 2”, Findchart.co, 2017. [Online]. Avail­able: http://findchart.co/download.php?aHR0cDovL3d3dy53ZWlidWxsLmNvbS9ob3R3aXJlL2lzc3VlMjEvaHQyMV8xLmdpZg. [Accessed: 03- Jan- 2017].

[16]   “Func­tion­al safe­ty of machine con­trols — Appli­ca­tion of EN ISO 13849 (BGIA Report 2/2008e)”, dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/publikationen/reports-download/bgia-reports-2007-bis-2008/bgia-report-2–2008/index-2.jsp. [Accessed: 2017-01-04].

Digiprove sealCopy­right secured by Digiprove © 2017
Acknowl­edge­ments: IEC, ISO and oth­ers as cit­ed
Some Rights Reserved