## ISO 13849 – 1 Analysis — Part 5: Diagnostic Coverage (DC)

This entry is part 5 of 9 in the series How to do a 13849 – 1 ana­lys­is

# What is Diagnostic Coverage?

Understanding Diagnostic Coverage (DC) as it is used in ISO 13849 – 1 [1] is crit­ic­al to ana­lys­ing the design of any safety func­tion assessed using this stand­ard. In case you missed a pre­vi­ous part of the series, you can read it here.

In the last instal­ment of this series dis­cuss­ing MTTFD, I brought up the fact that everything fails even­tu­ally, and so everything has a nat­ur­al fail­ure rate. The bathtub curve shown at the top of this post shows a typ­ic­al fail­ure rate curve for most products. Failure rates tell you the aver­age time (or some­times the mean time) it takes for com­pon­ents or sys­tems to fail. Failure rates are expressed in many ways, MTTFD and PFHd being the ways rel­ev­ant to this dis­cus­sion of ISO 13849 ana­lys­is. MTTFis giv­en in years, and PFHd is giv­en in frac­tion­al hours (1/​h). As a remind­er, PFHd stands for “Probability of dan­ger­ous Failure per Hour”.

Three of the stand­ard archi­tec­tures include auto­mat­ic dia­gnost­ic func­tions, Categories 2, 3 and 4. As soon as we add dia­gnostics to the sys­tem, we need to know what faults the dia­gnostics can detect and how many of the dan­ger­ous fail­ures rel­at­ive to the total num­ber of fail­ures that rep­res­ents. Diagnostic Coverage (DC) rep­res­ents the ratio of dan­ger­ous fail­ures that can be detec­ted to the total dan­ger­ous fail­ures that could occur, expressed as a per­cent­age. There will be some fail­ures that do not res­ult in a dan­ger­ous fail­ure, and those fail­ures are excluded from DC because we don’t need to worry about them – if they occur, the sys­tem will not fail into a dan­ger­ous state.

Here’s the form­al defin­i­tion from [1]:

3.1.26 dia­gnost­ic cov­er­age (DC)

meas­ure of the effect­ive­ness of dia­gnostics, which may be determ­ined as the ratio between the fail­ure rate of detec­ted dan­ger­ous fail­ures and the fail­ure rate of total dan­ger­ous fail­ures

Note 1 to entry: Diagnostic cov­er­age can exist for the whole or parts of a safety-​related sys­tem. For example, dia­gnost­ic cov­er­age could exist for sensors and/​or logic sys­tem and/​or final ele­ments. [SOURCE: IEC 61508 – 4:1998, 3.8.6, mod­i­fied.]

That brings up two oth­er related defin­i­tions that need to be kept in mind [1]:

3.1.4 fail­ure

ter­min­a­tion of the abil­ity of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Failure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Failures which only affect the avail­ab­il­ity of the pro­cess under con­trol are out­side of the scope of this part of ISO 13849. [SOURCE: IEC 60050 – 191:1990, 04 – 01.]

and the most import­ant one [1]:

3.1.5 dan­ger­ous fail­ure

fail­ure which has the poten­tial to put the SRP/​CS in a haz­ard­ous or fail-​to-​function state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem; in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-​to- func­tion state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, mod­i­fied.]

Just as a remind­er, SRP/​CS stands for “safety-​related parts of con­trol sys­tems”.

## Failure Math

### Failure Rate Data Sources

To do any cal­cu­la­tions, we need data, and this is true for fail­ure rates as well. ISO 13849 – 1 provides some tables in the annexes that list some com­mon types of com­pon­ents and their asso­ci­ated fail­ure rates, and there are more fail­ure rate tables in ISO 13849 – 2. A word of cau­tion here: Do not mix sources of fail­ure rate data, as the con­di­tions under which that data is true won’t match the data in ISO 13849. There are a few good sources of fail­ure rate data out there, for example, MIL-​HDBK-​217, Reliability Prediction of Electronic Equipment [15], as well as the data­base main­tained by Exida. In any case, use a single source for your fail­ure rate data.

### Failure Rate Variables

IEC 61508 [7] defines a num­ber of vari­ables related to fail­ure rates. The lower­case Greek let­ter lambda, $\lambda$, is used to denote fail­ures.

The com­mon vari­able des­ig­na­tions used are:

$\lambda$ = fail­ures
$\lambda_{(t)}$= fail­ure rate
$\lambda_s$ = “safe” fail­ures
$\lambda_d$ = “dan­ger­ous” fail­ures
$\lambda_{dd}$ = detect­able “dan­ger­ous” fail­ures
$\lambda_{du}$ = undetect­able “dan­ger­ous” fail­ures

### Calculating DC

Of these vari­ables, we only need to con­cern ourselves with $\lambda_d$, $\lambda_{dd}$ and $\lambda_{du}$. To under­stand how these vari­ables are used, we can express their rela­tion­ship as

$\lambda_d=\lambda_{dd}+\lambda_{du}$

Following on that idea, the Diagnostic Coverage can be expressed as a per­cent­age like this:

$DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100$

## Determining DC%

If you want to actu­ally cal­cu­late DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hard­core types to IEC 61508 – 2, Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems – Part 2: Requirements for electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems. This stand­ard goes into some depth on how to determ­ine fail­ure rates and how to cal­cu­late the “Safe Failure Fraction,” a num­ber which is related to DC but is not the same.

For every­one else, the good news is that you can use the table in Annex E to estim­ate the DC%. It’s worth not­ing here that Annex E is “Informative.” In standards-​speak, this means that the inform­a­tion in the annex is not part of the “norm­at­ive” text, which means that it is simply inform­a­tion to help you use the norm­at­ive part of the stand­ard. The design must con­form to the require­ments in the norm­at­ive text if you want to claim con­form­ity to the stand­ard. The fact that [1, Annex E] is inform­at­ive gives you the option to cal­cu­late the DC% value rather than select­ing it from Table E.1. Using the cal­cu­lated value would not viol­ate the require­ments in the norm­at­ive text.

If you are using IFA SISTEMA [16] to do the cal­cu­la­tions for you, you will find that the soft­ware lim­its you to select­ing a single DC meas­ure from Table E.1, and this prin­ciple applies if you are doing the cal­cu­la­tions by hand too. Only one item from Table E.1 can be selec­ted for a giv­en safety func­tion.

## Ranking DC

Once you have determ­ined the DC for a safety func­tion, you need to com­pare the DC value against [1, Table 5] to see if the DC is suf­fi­cient for the PLr you are try­ing to achieve. Table 5 bins the DC res­ults into four ranges. Just like bin­ning the PFHd val­ues into five ranges helps to pre­vent pre­ci­sion bias in estim­at­ing the prob­ab­il­ity of fail­ure of the com­plete sys­tem or safety func­tion, the ranges in Table 5 helps to pre­vent pre­ci­sion bias in the cal­cu­lated or selec­ted DC val­ues.

If the DC value was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add addi­tion­al dia­gnost­ic fea­tures so that you can either select a high­er cov­er­age from [1, Table E.1] or cal­cu­late a high­er value using [14].

## Multiple safety functions

When you have mul­tiple safety func­tions that make up a com­plete safety sys­tem, for example, an emer­gency stop func­tion and a guard inter­lock­ing func­tion, the DC val­ues need to be aver­aged to determ­ine the over­all DC for the com­plete sys­tem. [1, Annex E] provides you with a meth­od to do this in Equation E.1.

Plug in the val­ues for MTTFD and DC for each safety func­tion, and cal­cu­late the res­ult­ing DCavg value for the com­plete sys­tem.

That’s it for this art­icle. The next part will cov­er Common Cause Failures (CCF). Look for it on 20-​Mar-​17!

In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[16]     “IFA – Practical aids: Software-​Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv​.de, 2017. [Online]. Available: http://​www​.dguv​.de/​i​f​a​/​p​r​a​x​i​s​h​i​l​f​e​n​/​p​r​a​c​t​i​c​a​l​-​s​o​l​u​t​i​o​n​s​-​m​a​c​h​i​n​e​-​s​a​f​e​t​y​/​s​o​f​t​w​a​r​e​-​s​i​s​t​e​m​a​/​i​n​d​e​x​.​jsp. [Accessed: 30- Jan- 2017].

## ISO 13849 – 1 Analysis — Part 2: Safety Requirement Specification

This entry is part 2 of 9 in the series How to do a 13849 – 1 ana­lys­is

## Developing the Safety Requirement Specification

The Safety Requirement Specification sounds pretty heavy, but actu­ally, it is just a big name for a way to organ­ise the inform­a­tion you need to have to ana­lyse and design the safety sys­tems for your machinery. Note that I am assum­ing that you are doing this in the “right” order, mean­ing that you are plan­ning the design before­hand, rather than try­ing to back-​fill the doc­u­ment­a­tion after com­plet­ing the design. In either case, the pro­cess is the same, but get­ting the inform­a­tion you need can be much harder after the fact, than before the doing the design work. Doing some aspects in a review mode is impossible, espe­cially if a third party to whom you have no access did the design work [8].

If you missed the first instal­ment in this series, you can read it here.

## What goes into a Safety Requirements Specification?

For ref­er­ence, chapter 5 of ISO 13849 – 1 [1] cov­ers safety require­ment spe­cific­a­tions to some degree, but it needs some cla­ri­fic­a­tion I think. First of all, what is a safety func­tion?

Safety func­tions include any func­tion of the machine that has a dir­ect pro­tect­ive effect for the work­er using the machinery. However, using this defin­i­tion, it is pos­sible to ignore some import­ant func­tions. Complementary pro­tect­ive meas­ures, like emer­gency stop, can be missed because they are usu­ally “after the fact”, i.e., the injury occurs, and then the E-​stop is pressed, so you can­not say that it has a “dir­ect pro­tect­ive effect”. If we look at the defin­i­tions in [1], we find:

3.1.20

safety func­tion

func­tion of the machine whose fail­ure can res­ult in an imme­di­ate increase of the risk(s)
[SOURCE: ISO 12100:2010, 3.30.]

## Linking Risk to Functional Safety

Referring to the risk assess­ment, any risk con­trol that pro­tects work­ers from some aspect of the machine oper­a­tion using a con­trol func­tion like an inter­locked gate, or by main­tain­ing a tem­per­at­ure below a crit­ic­al level or speed at a safe level, is a safety func­tion. For example: if the tem­per­at­ure in a pro­cess rises too high, the pro­cess will explode; or if a shaft speed is too high (or too low) the tool may shat­ter and eject broken pieces at high speed. Therefore, the tem­per­at­ure con­trol func­tion and the speed con­trol func­tion are safety func­tions. These func­tions may also be pro­cess con­trol func­tions, but the poten­tial for an imme­di­ate increase in risk due to a fail­ure is what makes these func­tions safety func­tions no mat­ter what else they may do.

[1, Table 8] gives you some examples of vari­ous kinds of safety func­tions found on machines. The table is not inclus­ive – mean­ing there are many more safety func­tions out there than are lis­ted in the table. Your job is to fig­ure out which ones live in your machine. It is a bit like Pokemon – ya gotta catch ‘em all!

## Basic Safety Requirement Specification

Each safety func­tion must have a Performance Level or a Safety Integrity Level assigned as part of the risk assess­ment. For each safety func­tion, you need to devel­op the fol­low­ing inform­a­tion:

Basic Safety Requirement Specification
Item Description
Safety Function Identification Name or oth­er ref­er­ences, e.g. “Access Gate Interlock” or “Hazard Zone 2.”
Functional Characteristics
• Intended use or fore­see­able mis­use of the machine rel­ev­ant to the safety func­tion
• Operating modes rel­ev­ant to the safety func­tion
• Cycle time of the machine
• Response time of the safety func­tion
Emergency Operation Is this an emer­gency oper­a­tion func­tion? If yes, what types of emer­gen­cies might be mit­ig­ated by this func­tion?
Interactions What oper­at­ing modes require this func­tion to be oper­a­tion­al? Are there modes where this func­tion requires delib­er­ate bypass? These could include nor­mal work­ing modes (auto­mat­ic, manu­al, set-​up, changeover), and fault-​finding or main­ten­ance modes.
Behaviour How you want the sys­tem to behave when the safety func­tion is triggered, i.e., Power is imme­di­ately removed from the MIG weld­er using an IEC 60204 – 1 Category 0 stop func­tion, and robot motions are stopped using IEC 60204 – 1 Category 1 stop func­tion through the robot safety stop input.

or

All hori­zont­al pneu­mat­ic motions stop in their cur­rent pos­i­tions. Vertical motions return to the raised or retrac­ted pos­i­tions.

Also to be con­sidered is a power loss con­di­tion. Should the sys­tem behave in the same way as if the safety func­tion was triggered, not react at all, or do some­thing else? Consider ver­tic­al axes that might require hold­ing brakes or oth­er mech­an­isms to pre­vent power loss caus­ing unex­pec­ted motion.

Machine State after trig­ger­ing What is the expec­ted state of the machine after trig­ger­ing the safety func­tion? What is the recov­ery pro­cess?
Frequency of Operation How often do you expect this safety func­tion to be used? A reas­on­able estim­ate is needed. More on this below.
Priority of Operation If sim­ul­tan­eous trig­ger­ing of mul­tiple safety func­tions is pos­sible, which function(s) takes pre­ced­ence? E.g., Emergency Stop always takes pre­ced­ence over everything else. What hap­pens if you have a safe speed func­tion and a guard inter­lock that are asso­ci­ated because the inter­lock is part of a guard­ing func­tion cov­er­ing a shaft, and you need to troubleshoot the safe speed func­tion, so you need access to the shaft where the encoders are moun­ted?
Required Performance Level I sug­gest record­ing the S, F, and P val­ues selec­ted as well as the PLr value selec­ted for later ref­er­ence.

Here’s an example table in MS Word format that you can use as a start­ing point for your SRS doc­u­ments. Note that SRS can be much more detailed than this. If you want more inform­a­tion on this, read IEC 61508 – 1, 7.10.2.

So, that is the min­im­um. You can add lots more inform­a­tion to the min­im­um require­ments, but this will get you star­ted. If you want more inform­a­tion on devel­op­ing the SRS, you will need to get a copy of IEC 61508 [7].

## What’s Next?

Next, you need to be able to make some design decisions about sys­tem archi­tec­ture and com­pon­ents. Circuit archi­tec­tures have been dis­cussed at some length on the MS101 blog in the past, so I am not going to go through them again in this series. Instead, I will show you how to choose an archi­tec­ture based on your design goals in the next instal­ment. In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

## ISO 13849 – 1 Analysis — Part 1: Start with Risk Assessment

This entry is part 1 of 9 in the series How to do a 13849 – 1 ana­lys­is

I often get ques­tions from cli­ents about how to get star­ted on Functional Safety using ISO 13849. This art­icle is the first in a series that will walk you through the basics of using ISO 13849. Keep in mind that you will need to hold a copy of the 3rd edi­tion of ISO 13849 – 1 [1] and the 2nd edi­tion of ISO 13849 – 2 [2] to use as you go along. There are oth­er stand­ards which you may also find use­ful, and I have included them in the Reference sec­tion at the end of the art­icle. Each post has a Reference List. I will pub­lish a com­plete ref­er­ence list for the series with the last post.

## Where to start?

So you have just learned that you need to do an ISO 13849 func­tion­al safety ana­lys­is. You have the two parts of the stand­ard, and you have skimmed them, but you are feel­ing a bit over­whelmed and unsure of where to start. By the end of this art­icle, you should be feel­ing more con­fid­ent about how to get this job done.

## Step 1 – Risk Assessment

For the pur­pose of this art­icle, I am going to assume that you have a risk assess­ment for the machinery, and you have a copy for ref­er­ence. If you do not have a risk assess­ment, stop here and get that done. There are sev­er­al good ref­er­ences for that, includ­ing ISO 12100 [3], CSA Z432 [4], and ANSI B11.TR3 [5]. You can also have a look at my series on Risk Assessment.

The risk assess­ment should identi­fy which risks require mit­ig­a­tion using the con­trol sys­tem, e.g., use of an inter­locked gate, a light cur­tain, a two-​hand con­trol, an enabling device, etc.See the MS101 gloss­ary for detailed defin­i­tions. Each of these becomes a safety func­tion. Each safety func­tion requires a safety require­ments spe­cific­a­tion (SRS), which I will describe in more detail a bit later.

## Safety Functions

The 3rd edi­tion of ISO 13849 [1] provides two tables that give some examples of safety func­tion char­ac­ter­ist­ics [1, Table 8] and para­met­ers [1, Table 9] and also provides ref­er­ences to cor­res­pond­ing stand­ards that will help you to define the neces­sary para­met­ers. These tables should not be con­sidered to be exhaust­ive – there is no way to list every pos­sible safety func­tion in a table like this. The tables will give you some good ideas about what you are look­ing for in machine con­trol func­tions that will make them safety func­tions.

While you are identi­fy­ing risk reduc­tion meas­ures that will use the con­trol sys­tem for mit­ig­a­tion, don’t for­get that com­ple­ment­ary pro­tect­ive meas­ures like emer­gency stop, enabling devices, etc. all need to be included. Some of these func­tions may have min­im­um require­ments set by Type B2 stand­ards, like ISO 13850 [6] for emer­gency stop which sets the min­im­um per­form­ance level for this func­tion at PLc.

## Selecting the Required Performance Level

ISO 13849 – 1:2015 provides a graph­ic­al means for select­ing the min­im­um Performance Level (PL) required for the safety func­tion based on the risk assess­ment. A word of cau­tion here: you may feel like you are re-​assessing the risk using this tool because it does use risk para­met­ers (sever­ity, frequency/​duration of expos­ure and pos­sib­il­ity to avoid/​limit harm) to determ­ine the PL. Risk assess­ment This tool is not a risk assess­ment tool, and using it that way is a fun­da­ment­al mis­take. Its out­put is in terms of per­form­ance level, which is fail­ure rate per hour of oper­a­tion. For example, it is entirely incor­rect to say, “This machine has a risk level of PLc” since we define PLs in terms of prob­able fail­ure rate per hour.

Once you have assigned a required Performance Level (PLr) to each safety func­tion, you can move on to the next step: Developing the Safety Requirements Specification.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.