ISO 13849 – 1 Analysis — Part 4: MTTFD – Mean Time to Dangerous Failure

This entry is part 4 of 9 in the series How to do a 13849 – 1 ana­lys­is

Functional safety is all about the like­li­hood of a safety sys­tem fail­ing to oper­ate when you need it. Understanding Mean Time to Dangerous Failure, or MTTFD, is crit­ic­al. If you have been read­ing about this top­ic at all, you may notice that I am abbre­vi­at­ing Mean Time to Dangerous Failure with all cap­it­al let­ters. Using MTTFD is a recent change that occurred in the third edi­tion of ISO 13849 – 1, pub­lished in 2015. In the first and second edi­tions, the cor­rect abbre­vi­ation was MTTFd. Onward!

If you missed the third instal­ment in this series, you can read it here.

Defining MTTFD

Let’s start by hav­ing a look at some key defin­i­tions. Looking at [1, Cl. 3], you will find:

3.1.1 safety – related part of a con­trol sys­tem (SRP/​CS)—part of a con­trol sys­tem that responds to safety-​related input sig­nals and gen­er­ates safety-​related
out­put sig­nals

Note 1 to entry: The com­bined safety-​related parts of a con­trol sys­tem start at the point where the safety-​related input sig­nals are ini­ti­ated (includ­ing, for example, the actu­at­ing cam and the roller of the pos­i­tion switch) and end at the out­put of the power con­trol ele­ments (includ­ing, for example, the main con­tacts of a con­tact­or)

Note 2 to entry: If mon­it­or­ing sys­tems are used for dia­gnostics, they are also con­sidered as SRP/​CS.

3.1.5 dan­ger­ous fail­ure—fail­ure which has the poten­tial to put the SRP/​CS in a haz­ard­ous or fail-​to-​function state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem;
in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-​tofunction
state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, mod­i­fied.]

3.1.25 mean time to dan­ger­ous fail­ure (MTTFD)—expect­a­tion of the mean time to dan­ger­ous fail­ure

Definition 3.1.5 is pretty help­ful, but defin­i­tion 3.1.25 is, well, not much of a defin­i­tion. Let’s look at this anoth­er way.

Failures and Faults

Since everything can and will even­tu­ally fail to per­form the way we expect it to, we know that everything has a fail­ure rate because everything takes some time to fail. Granted that this time may be very short, like the first time the unit is turned on, or it may be very long, some­times hun­dreds of years. Remember that because this is a rate, it is some­thing that occurs over time. It is also import­ant to be clear that we are talk­ing about fail­ures and not faults. Reading from [1]:

3.1.3 fault—state of an item char­ac­ter­ized by the inab­il­ity to per­form a required func­tion, exclud­ing the inab­il­ity dur­ing pre­vent­ive main­ten­ance or oth­er planned actions, or due to lack of extern­al resources

Note 1 to entry: A fault is often the res­ult of a fail­ure of the item itself, but may exist without pri­or fail­ure.

Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault.
[SOURCE: IEC 60050?191:1990, 05 – 01.]

3.1.4 fail­ure— ter­min­a­tion of the abil­ity of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Failure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Failures which only affect the avail­ab­il­ity of the pro­cess under con­trol are out­side of the scope of this part of ISO 13849.
[SOURCE: IEC 60050 – 191:1990, 04 – 01.]

3.1.4 Note 2 is the import­ant one at this point in the dis­cus­sion.

Now, where we have mul­tiples of some­thing, like relays, valves, or safety sys­tems, we now have a pop­u­la­tion of identic­al items, each of which will even­tu­ally fail at some point. We can count those fail­ures as they occur and tally them up, and we can graph how many fail­ures we get in the pop­u­la­tion over time. If this is start­ing to sound sus­pi­ciously like stat­ist­ics to you, that is because it is.

OK, so let’s look at the kinds of fail­ures that occur in that pop­u­la­tion. Some fail­ures will res­ult in a “safe” state, e.g., a relay fail­ing with all poles open, and some will fail in a poten­tially “dan­ger­ous” state, like a nor­mally closed valve devel­op­ing a sig­ni­fic­ant leak. If we tally up all the fail­ures that occur, and then tally the num­ber of “safe” fail­ures and the num­ber of “dan­ger­ous” fail­ures in that pop­u­la­tion, we now have some very use­ful inform­a­tion.

The dif­fer­ent kinds of fail­ures are sig­ni­fied using the lower­case Greek let­ter \lambda (lambda). We can add some sub­scripts to help identi­fy what kinds of fail­ures we are talk­ing about. The com­mon vari­able des­ig­na­tions used are [14]:

\lambda = fail­ures
\lambda_{(t)} = fail­ure rate
\lambda_s = “safe” fail­ures
\lambda_d = “dan­ger­ous” fail­ures
\lambda_{dd} = detect­able “dan­ger­ous” fail­ures
\lambda_{du} = undetect­able “dan­ger­ous” fail­ures

I will be dis­cuss­ing some of these vari­ables in more detail in a later part of the series when I delve into Diagnostic Coverage, so don’t worry about them too much just yet.

Getting to MTTFD

Since we can now start to deal with the fail­ure rate data math­em­at­ic­ally, we can start to do some cal­cu­la­tions about expec­ted life­time of a com­pon­ent or a sys­tem. That expec­ted, or prob­able, life­time is what defin­i­tion 3.1.25 was on about, and is what we call MTTFD.

MTTFD is the time in years over which the prob­ab­il­ity of fail­ure is rel­at­ively con­stant. If you look at a typ­ic­al fail­ure rate curve, called a “bathtub curve” due to its resemb­lance to the pro­file of a nice soak­er tub, the MTTFD is the flat­ter por­tion of the curve between the end of the infant mor­tal­ity peri­od and the wear-​out peri­od at the end of life. This part of the curve is the por­tion assumed to be included in the “mis­sion time” for the product. ISO 13849 – 1 assumes the mis­sion time for all machinery is 20 years [1, 4.5.4] and [1, Cl. 10].

Diagram of a standardized bathtub-shaped failure rate curve.
Figure 1 – Typical Bathtub Curve [15]
ISO 13849 – 1 provides us with guid­ance on how MTTFD relates to the determ­in­a­tion of the PL in [1, Cl. 4.5.2]. MTTFD is fur­ther grouped into three bands as shown in [1, Table 4].
Table showing the bands of Mean time to dangerous failure of each channel (MTTFD)

The notes for this table are import­ant as well. Since you can’t read the notes par­tic­u­larly well in the table above, I’ve repro­duced them here:

NOTE 1 The choice of the MTTFD ranges of each chan­nel is based on fail­ure rates found in the field as state-​of-​the-​art, form­ing a kind of log­ar­ithmic scale fit­ting to the log­ar­ithmic PL scale. An MTTFD value of each chan­nel less than three years is not expec­ted to be found for real SRP/​CS since this would mean that after one year about 30 % of all sys­tems on the mar­ket will fail and will need to be replaced. An MTTFD value of each chan­nel great­er than 100 years is not accept­able because SRP/​CS for high risks should not depend on the reli­ab­il­ity of com­pon­ents alone. To rein­force the SRP/​CS against sys­tem­at­ic and ran­dom fail­ure, addi­tion­al means such as redund­ancy and test­ing should be required. To be prac­tic­able, the num­ber of ranges was restric­ted to three. The lim­it­a­tion of MTTFD of each chan­nel val­ues to a max­im­um of 100 years refers to the single chan­nel of the SRP/​CS which car­ries out the safety func­tion. Higher MTTFD val­ues can be used for single com­pon­ents (see Table D.1).

NOTE 2 The indic­ated bor­ders of this table are assumed with­in an accur­acy of 5%.

The stand­ard then tells us to select the MTTFD using a simple hier­archy [1, 4.5.2]:

For the estim­a­tion ofMTTFD of a com­pon­ent, the hier­arch­ic­al pro­ced­ure for find­ing data shall be, in the order giv­en:

a) use manufacturer’s data;
b) use meth­ods in Annex C and Annex D;
c) choose 10 years.

Why ten years? Ten years is half of the assumed mis­sion life­time of 20 years. More on mis­sion life­time in a later post.

Looking at [1, Annex C.2], you will find the “Good Engineering Practices” meth­od for estim­at­ing MTTFD, pre­sum­ing the man­u­fac­turer has not provided you with that inform­a­tion. ISO 13849 – 2 [2] has some ref­er­ence tables that provide some gen­er­al MTTFD val­ues for some kinds of com­pon­ents, but not every part that exists can be lis­ted. How can we deal with parts not lis­ted? [1, Annex C.4] provides us with a cal­cu­la­tion meth­od for estim­at­ing MTTFD for pneu­mat­ic, mech­an­ic­al and elec­tromech­an­ic­al com­pon­ents.

Calculating MTTFD for pneumatic, mechanical and electromechanical components

I need to intro­duce you to a few more vari­ables before we look at how to cal­cu­late MTTFD for a com­pon­ent.

Variables
Variable Description
B10 Number of cycles until 10% of the com­pon­ents fail (for pneu­mat­ic and elec­tromech­an­ic­al com­pon­ents)
B10D Number of cycles until 10% of the com­pon­ents fail dan­ger­ously (for pneu­mat­ic and elec­tromech­an­ic­al com­pon­ents)
T life­time of the com­pon­ent
T10D the mean time until 10% of the com­pon­ents fail dan­ger­ously
hop is the mean oper­a­tion time, in hours per day;
dop is the mean oper­a­tion time, in days per year;
tcycle is the mean oper­a­tion time between the begin­ning of two suc­cess­ive cycles of the com­pon­ent. (e.g., switch­ing of a valve) in seconds per cycle.
s seconds
h hours
a years

Knowing a few details we can cal­cu­late the MTTFD using [1, Eqn C.1]. We need to know the fol­low­ing para­met­ers for the applic­a­tion:

  • B10D
  • hop
  • dop
  • tcycle

Formula for calculating MTTFD - ISO 13849-1, Equation C.1
Calculating MTTFD – [1, Eqn. C.1]
In order to use [1, Eqn. C.1], we need to first cal­cu­late nop, using [1, Eqn. C.2]:

Formula for calculating nop - ISO 13849-1, Equation C.2.
Calculating nop – [1, Eqn. C.2]
We may also need one more cal­cu­la­tion, [1, Eqn. C.4]:
Calculating T10D using ISO 13849-1 Eqn. C.3
Calculating T10D – [1, Eqn. C.4]

Example Calculation [1, C.4.3]

For a pneu­mat­ic valve, a man­u­fac­turer determ­ines a mean value of 60 mil­lion cycles as B10D. The valve is used for two shifts each day on 220 oper­a­tion days a year. The mean time between the begin­ning of two suc­cess­ive switch­ing of the valve is estim­ated as 5 s. This yields the fol­low­ing val­ues:

  • dop of 220 days per year;
  • hop of 16 h per day;
  • tcycle of 5 s per cycle;
  • B10D of 60 mil­lion cycles.

Doing the math, we get:

Example C.4.3 calculations from, ISO 13849-1.
Example C.4.3

So there you have it, at least for a fairly simple case. There are more examples in ISO 13849 – 1, and I would encour­age you to work through them. You can also find a wealth of examples in a report pro­duced by the BGIA in Germany, called the Functional safety of machine con­trols (BGIA Report 2/​2008e) [16]. The down­load for the report is linked from the ref­er­ence list at the end of this art­icle. If you are a SISTEMA user, there are lots of examples in the SISTEMA Cookbooks, and there are example files avail­able so that you can see how to assemble the sys­tems in the soft­ware.

The next part of this series cov­ers Diagnostic Coverage (DC), and the aver­age DC for mul­tiple safety func­tions in a sys­tem, DCavg.

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety crit­ic­al sys­tems hand­book. Amsterdam: Elsevier/​Butterworth-​Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of tech­niques and meas­ures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of tech­niques and meas­ures related to EMC for Functional Safety, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[1]     Safety of machinery — Safety-​related parts of con­trol sys­tems — Part 1: General prin­ciples for design. 3rd Edition. ISO Standard 13849 – 1. 2015.

[2]     Safety of machinery – Safety-​related parts of con­trol sys­tems – Part 2: Validation. 2nd Edition. ISO Standard 13849 – 2. 2012.

[7]     Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems. 7 parts. IEC Standard 61508. Second Edition. 2010.

[14]    Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems – Part 4: Definitions and abbre­vi­ations. IEC Standard 61508 – 4. Second Edition. 2010.

[15]    “The bathtub curve and product fail­ure beha­vi­or part 1 of 2”, Findchart​.co, 2017. [Online]. Available: http://​find​chart​.co/​d​o​w​n​l​o​a​d​.​p​h​p​?​a​H​R​0​c​D​o​v​L​3​d​3​d​y​5​3​Z​W​l​i​d​W​x​s​L​m​N​v​b​S​9​o​b​3​R​3​a​X​J​l​L​2​l​z​c​3​V​l​M​j​E​v​a​H​Q​y​M​V​8​x​L​m​d​pZg. [Accessed: 03- Jan- 2017].

[16]   “Functional safety of machine con­trols – Application of EN ISO 13849 (BGIA Report 2/​2008e)”, dguv​.de, 2017. [Online]. Available: http://​www​.dguv​.de/​i​f​a​/​p​u​b​l​i​k​a​t​i​o​n​e​n​/​r​e​p​o​r​t​s​-​d​o​w​n​l​o​a​d​/​b​g​i​a​-​r​e​p​o​r​t​s​-​2​0​0​7​-​b​i​s​-​2​0​0​8​/​b​g​i​a​-​r​e​p​o​r​t-2 – 2008/index-2.jsp. [Accessed: 2017-​01-​04].

Digiprove sealCopyright secured by Digiprove © 2017
Acknowledgements: IEC, ISO and oth­ers as cited
Some Rights Reserved