Machinery Safety 101

ISO 13849 – 1 Analysis — Part 5: Diagnostic Coverage (DC)

This entry is part 5 of 9 in the series How to do a 13849 – 1 analysis

Post updated 2019-07-24. Ed.

What is Diagnostic Coverage?

Under­stand­ing Dia­gnost­ic Cov­er­age (DC) as it is used in ISO 13849 – 1 [1] is crit­ic­al to ana­lys­ing the design of any safety func­tion assessed using this stand­ard. In case you missed a pre­vi­ous part of the series, you can read it here.

In the last instal­ment of this series dis­cuss­ing MTTFD, I brought up the fact that everything fails even­tu­ally, and so everything has a nat­ur­al fail­ure rate. The bathtub curve shown at the top of this post shows a typ­ic­al fail­ure rate curve for most products. Fail­ure rates tell you the aver­age time (or some­times the mean time) it takes for com­pon­ents or sys­tems to fail. Fail­ure rates are expressed in many ways, MTTFD and PFHd being the ways rel­ev­ant to this dis­cus­sion of ISO 13849 ana­lys­is. MTTFis giv­en in years, and PFHd is giv­en in frac­tion­al hours (1/h). As a remind­er, PFHd stands for “Prob­ab­il­ity of dan­ger­ous Fail­ure per Hour”.

Three of the stand­ard archi­tec­tures include auto­mat­ic dia­gnost­ic func­tions, Cat­egor­ies 2, 3 and 4. As soon as we add dia­gnostics to the sys­tem, we need to know what faults the dia­gnostics can detect and how many of the dan­ger­ous fail­ures rel­at­ive to the total num­ber of fail­ures that rep­res­ents. Dia­gnost­ic Cov­er­age (DC) rep­res­ents the ratio of dan­ger­ous fail­ures that can be detec­ted to the total dan­ger­ous fail­ures that could occur, expressed as a per­cent­age. There will be some fail­ures that do not res­ult in a dan­ger­ous fail­ure, and those fail­ures are excluded from DC because we don’t need to worry about them – if they occur, the sys­tem will not fail into a dan­ger­ous state.

Here’s the form­al defin­i­tion from [1]:

3.1.26 dia­gnost­ic cov­er­age (DC)

meas­ure of the effect­ive­ness of dia­gnostics, which may be determ­ined as the ratio between the fail­ure rate of detec­ted dan­ger­ous fail­ures and the fail­ure rate of total dan­ger­ous failures

Note 1 to entry: Dia­gnost­ic cov­er­age can exist for the whole or parts of a safety-related sys­tem. For example, dia­gnost­ic cov­er­age could exist for sensors and/or logic sys­tem and/or final ele­ments. [SOURCE: IEC 61508 – 4:1998, 3.8.6, modified.]

That brings up two oth­er related defin­i­tions that need to be kept in mind [1]:

3.1.4 fail­ure

ter­min­a­tion of the abil­ity of an item to per­form a required function

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Fail­ure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Fail­ures which only affect the avail­ab­il­ity of the pro­cess under con­trol are out­side of the scope of this part of ISO 13849. [SOURCE: IEC 60050 – 191:1990, 04 – 01.]

and the most import­ant one [1]:

3.1.5 dan­ger­ous failure

fail­ure which has the poten­tial to put the SRP/CS in a haz­ard­ous or fail-to-func­tion state

Note 1 to entry: Wheth­er or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem; in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-to- func­tion state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, modified.]

Just as a remind­er, SRP/CS stands for “safety-related parts of con­trol systems”.

Failure Math

Failure Rate Data Sources

To do any cal­cu­la­tions, we need data, and this is true for fail­ure rates as well. ISO 13849 – 1 provides some tables in the annexes that list some com­mon types of com­pon­ents and their asso­ci­ated fail­ure rates, and there are more fail­ure rate tables in ISO 13849 – 2. A word of cau­tion here: Do not mix sources of fail­ure rate data, as the con­di­tions under which that data is true won’t match the data in ISO 13849. There are a few good sources of fail­ure rate data out there, for example, MIL-HDBK-217, Reli­ab­il­ity Pre­dic­tion of Elec­tron­ic Equip­ment [15], as well as the data­base main­tained by Exida. In any case, use a single source for your fail­ure rate data.

Failure Rate Variables

IEC 61508 [7] defines a num­ber of vari­ables related to fail­ure rates. The lower­case Greek let­ter lambda, $latex \lambda$, is used to denote failures.

The com­mon vari­able des­ig­na­tions used are:

$latex \lambda$ = fail­ures
$latex \lambda_{(t)} $= fail­ure rate
$latex \lambda_s$ = “safe” fail­ures
$latex \lambda_d$ = “dan­ger­ous” fail­ures
$latex \lambda_{dd}$ = detect­able “dan­ger­ous” fail­ures
$latex \lambda_{du}$ = undetect­able “dan­ger­ous” failures

Calculating DC

Of these vari­ables, we only need to con­cern ourselves with $latex \lambda_d$, $latex \lambda_{dd}$ and $latex \lambda_{du}$. To under­stand how these vari­ables are used, we can express their rela­tion­ship as

$latex \lambda_d=\lambda_{dd}+\lambda_{du}$

Fol­low­ing on that idea, the Dia­gnost­ic Cov­er­age can be expressed as a per­cent­age like this:

$latex DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100$

Determining DC%

If you want to actu­ally cal­cu­late DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hard­core types to IEC 61508 – 2, Func­tion­al safety of electrical/electronic/programmable elec­tron­ic safety-related sys­tems – Part 2: Require­ments for electrical/electronic/programmable elec­tron­ic safety-related sys­tems. This stand­ard goes into some depth on how to determ­ine fail­ure rates and how to cal­cu­late the “Safe Fail­ure Frac­tion,” a num­ber which is related to DC but is not the same.

For every­one else, the good news is that you can use the table in Annex E to estim­ate the DC%. It’s worth not­ing here that Annex E is “Inform­at­ive.” In stand­ards-speak, this means that the inform­a­tion in the annex is not part of the “norm­at­ive” text, which means that it is simply inform­a­tion to help you use the norm­at­ive part of the stand­ard. The design must con­form to the require­ments in the norm­at­ive text if you want to claim con­form­ity to the stand­ard. The fact that [1, Annex E] is inform­at­ive gives you the option to cal­cu­late the DC% value rather than select­ing it from Table E.1. Using the cal­cu­lated value would not viol­ate the require­ments in the norm­at­ive text.

If you are using IFA SISTEMA [16] to do the cal­cu­la­tions for you, you will find that the soft­ware lim­its you to select­ing a single DC meas­ure from Table E.1, and this prin­ciple applies if you are doing the cal­cu­la­tions by hand too. Only one item from Table E.1 can be selec­ted for a giv­en safety function.

Ranking DC

Once you have determ­ined the DC for a safety func­tion, you need to com­pare the DC value against [1, Table 5] to see if the DC is suf­fi­cient for the PLr you are try­ing to achieve. Table 5 bins the DC res­ults into four ranges. Just like bin­ning the PFHd val­ues into five ranges helps to pre­vent pre­ci­sion bias in estim­at­ing the prob­ab­il­ity of fail­ure of the com­plete sys­tem or safety func­tion, the ranges in Table 5 helps to pre­vent pre­ci­sion bias in the cal­cu­lated or selec­ted DC values.

ISO 13849-1, Table 5 Diagnostic coverage (DC)
ISO 13849 – 1, Table 5 Dia­gnost­ic cov­er­age (DC)

If the DC value was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add addi­tion­al dia­gnost­ic fea­tures so that you can either select a high­er cov­er­age from [1, Table E.1] or cal­cu­late a high­er value using [14].

Multiple safety functions

When you have mul­tiple safety func­tions that make up a com­plete safety sys­tem, for example, an emer­gency stop func­tion and a guard inter­lock­ing func­tion, the DC val­ues need to be aver­aged to determ­ine the over­all DC for the com­plete sys­tem. [1, Annex E] provides you with a meth­od to do this in Equa­tion E.1.

Equation for averaging the DC values of multiple safety functions
ISO 13849 – 1‑2015 Equa­tion E.1

Plug in the val­ues for MTTFD and DC for each safety func­tion, and cal­cu­late the res­ult­ing DCavg value for the com­plete system.

That’s it for this art­icle. The next part will cov­er Com­mon Cause Fail­ures (CCF). Look for it on 20-Mar-17!

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this journey:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety crit­ic­al sys­tems hand­book, 3rd Ed. Ams­ter­dam: Elsevi­er­/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­ib­il­ity for Func­tion­al Safety, 1st ed. Steven­age, UK: The Insti­tu­tion of Engin­eer­ing and Tech­no­logy, 2008.

[0.3] Over­view of tech­niques and meas­ures related to EMC for Func­tion­al Safety, 1st ed. Steven­age, UK: Over­view of tech­niques and meas­ures related to EMC for Func­tion­al Safety, 2013.

[0.4] “Code of prac­tice for elec­tro­mag­net­ic resi­li­ence, 1st ed. Steven­age, UK: IET Stand­ards TC4.3 EMC, 2017.

[0.5] “Code of Prac­tice: Com­pet­ence for Safety Related Sys­tems Prac­ti­tion­ers, 1st ed. Steven­age, UK: The Insti­tu­tion of Engin­eer­ing and Tech­no­logy, 2016.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[1]     Safety of machinery — Safety-related parts of con­trol sys­tems — Part 1: Gen­er­al prin­ciples for design. 3rd Edi­tion. ISO Stand­ard 13849 – 1. 2015.

[7]     Func­tion­al safety of electrical/electronic/programmable elec­tron­ic safety-related sys­tems. 7 parts. IEC Stand­ard 61508. Edi­tion 2. 2010.

[14]   Func­tion­al safety of electrical/electronic/programmable elec­tron­ic safety-related sys­tems – Part 2: Require­ments for electrical/electronic/programmable elec­tron­ic safety-related sys­tems. IEC Stand­ard 61508 – 2. 2010.

[15]     Reli­ab­il­ity Pre­dic­tion of Elec­tron­ic Equip­ment. Mil­it­ary Hand­book MIL-HDBK-217F. 1991.

[16]     “IFA – Prac­tic­al aids: Soft­ware-Assist­ent SISTEMA: Safety Integ­rity – Soft­ware Tool for the Eval­u­ation of Machine Applic­a­tions”, Dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].

Series Nav­ig­a­tionISO 13849 – 1 Ana­lys­is — Part 4: MTTFD – Mean Time to Dan­ger­ous Fail­ure”>ISO 13849 – 1 Ana­lys­is — Part 4: MTTFD – Mean Time to Dan­ger­ous Fail­ureISO 13849 – 1 Ana­lys­is — Part 6: CCF — Com­mon Cause Fail­ures”>ISO 13849 – 1 Ana­lys­is — Part 6: CCF — Com­mon Cause Failures

5 thoughts on “ISO 13849 – 1 Analysis — Part 5: Diagnostic Coverage (DC)

  1. With 3 or more emer­gency stops in series, does the DCavg = 0% (none) since there is now the poten­tial for fault mask­ing? So does that auto­mat­ic­ally take my category3 archi­tec­ture down to category2 since my DC=none?

    btw, AMAZING inform­a­tion you have diges­ted for all of us to use. this has been incred­ibly helpful.

    1. Thanks, Alex! I really appre­ci­ate hear­ing that you’ve found my art­icles helpful.

      To answer your ques­tion, we have to turn to ISO/TR 24119 on fault mask­ing. The meth­od for assess­ing fault-mask­ing has, so far, only been covered in that tech­nic­al report for series-con­nec­ted elec­tromech­an­ic­al inter­lock­ing switches, how­ever, the con­sensus is that the same prin­ciples apply to e‑stop devices as well. BTW, the con­tents of ISO/TR 24119 are being incor­por­ated into the next edi­tion of ISO 14119, planned for pub­lic­a­tion in Q4 this year, or pos­sibly as late as the end of Q2 next year.

      I’ll point you to Table 1 in the TR, which is the Sim­pli­fied Meth­od. If you have more than two series-con­nec­ted devices, with any num­ber of addi­tion­al series-con­nec­ted devices, then DC falls to zero. This does not mean that your struc­ture cat­egory falls to Cat­egory 2, but it does mean that your design can­not meet Cat­egory 3. If this sounds con­fus­ing, remem­ber that the Cat­egor­ies don’t rep­res­ent a con­tinuüm, but are just a way to eas­ily identi­fy five dif­fer­ent struc­tures that have been ana­lyzed and char­ac­ter­ized by the tech­nic­al com­mit­tee. This is why fail­ing to meet all of the require­ments in Cat. 3 does­n’t provide you with a “fall-back” to Cat. 2. What it does mean is that using the ISO 13849 rules, you can­not pre­dict the abil­ity of the sys­tem to with­stand faults, and there­fore you can­not pre­dict the PL. The fact that you have redund­ant chan­nels means that some kinds of faults will be tol­er­ated, but you are unlikely to know that they have occurred since DC=0.

      If you use the detailed meth­od for determ­in­ing DC found in clause 6.3, you may find that you can devel­op a high­er DC, per­haps as high as DC=medium. You’ll have to go through your design using the clause 6.3 meth­ods, start­ing with the topo­logy of the con­nec­ted devices, and look­ing at the char­ac­ter­ist­ics of the con­trols to which the input devices are con­nec­ted. It’s more com­plic­ated, but may be worth­while for you.

      The solu­tion is rel­at­ively simple – don’t daisy-chain elec­tromech­an­ic­al input devices. You can have mul­tiple e‑stop or inter­lock­ing devices con­nec­ted via a safe net­work topo­logy, since every device has its own dia­gnostics built-in, and the net­work is not sub­ject to fault mask­ing like an ana­log system.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.