ISO 13849-1 Analysis — Part 5: Diagnostic Coverage (DC)

Post updated 2019-07-24. Ed.

What is Diagnostic Coverage?

Understanding Diagnostic Coverage (DC) as it is used in ISO 13849-1 [1] is critical to analysing the design of any safety function assessed using this standard. In case you missed a previous part of the series, you can read it here.

In the last instalment of this series discussing MTTFD, I brought up the fact that everything fails eventually, and so everything has a natural failure rate. The bathtub curve shown at the top of this post shows a typical failure rate curve for most products. Failure rates tell you the average time (or sometimes the mean time) it takes for components or systems to fail. Failure rates are expressed in many ways, MTTFD and PFHd being the ways relevant to this discussion of ISO 13849 analysis. MTTFis given in years, and PFHd is given in fractional hours (1/h). As a reminder, PFHd stands for “Probability of dangerous Failure per Hour”.

Three of the standard architectures include automatic diagnostic functions, Categories 2, 3 and 4. As soon as we add diagnostics to the system, we need to know what faults the diagnostics can detect and how many of the dangerous failures relative to the total number of failures that represents. Diagnostic Coverage (DC) represents the ratio of dangerous failures that can be detected to the total dangerous failures that could occur, expressed as a percentage. There will be some failures that do not result in a dangerous failure, and those failures are excluded from DC because we don’t need to worry about them – if they occur, the system will not fail into a dangerous state.

Here’s the formal definition from [1]:

3.1.26 diagnostic coverage (DC)

measure of the effectiveness of diagnostics, which may be determined as the ratio between the failure rate of detected dangerous failures and the failure rate of total dangerous failures

Note 1 to entry: Diagnostic coverage can exist for the whole or parts of a safety-related system. For example, diagnostic coverage could exist for sensors and/or logic system and/or final elements. [SOURCE: IEC 61508-4:1998, 3.8.6, modified.]

That brings up two other related definitions that need to be kept in mind [1]:

3.1.4 failure

termination of the ability of an item to perform a required function

Note 1 to entry: After a failure, the item has a fault.

Note 2 to entry: “Failure” is an event, as distinguished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items consisting of software only.

Note 4 to entry: Failures which only affect the availability of the process under control are outside of the scope of this part of ISO 13849. [SOURCE: IEC 60050–191:1990, 04-01.]

and the most important one [1]:

3.1.5 dangerous failure

failure which has the potential to put the SRP/CS in a hazardous or fail-to-function state

Note 1 to entry: Whether or not the potential is realized can depend on the channel architecture of the system; in redundant systems a dangerous hardware failure is less likely to lead to the overall dangerous or fail-to- function state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, modified.]

Just as a reminder, SRP/CS stands for “safety-related parts of control systems”.

Failure Math

Failure Rate Data Sources

To do any calculations, we need data, and this is true for failure rates as well. ISO 13849-1 provides some tables in the annexes that list some common types of components and their associated failure rates, and there are more failure rate tables in ISO 13849-2. A word of caution here: Do not mix sources of failure rate data, as the conditions under which that data is true won’t match the data in ISO 13849. There are a few good sources of failure rate data out there, for example, MIL-HDBK-217, Reliability Prediction of Electronic Equipment [15], as well as the database maintained by Exida. In any case, use a single source for your failure rate data.

Failure Rate Variables

IEC 61508 [7] defines a number of variables related to failure rates. The lowercase Greek letter lambda, $latex \lambda$, is used to denote failures.

The common variable designations used are:

$latex \lambda$ = failures
$latex \lambda_{(t)} $= failure rate
$latex \lambda_s$ = “safe” failures
$latex \lambda_d$ = “dangerous” failures
$latex \lambda_{dd}$ = detectable “dangerous” failures
$latex \lambda_{du}$ = undetectable “dangerous” failures

Calculating DC

Of these variables, we only need to concern ourselves with $latex \lambda_d$, $latex \lambda_{dd}$ and $latex \lambda_{du}$. To understand how these variables are used, we can express their relationship as

$latex \lambda_d=\lambda_{dd}+\lambda_{du}$

Following on that idea, the Diagnostic Coverage can be expressed as a percentage like this:

$latex DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100$

Determining DC%

If you want to actually calculate DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hardcore types to IEC 61508-2, Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 2: Requirements for electrical/electronic/programmable electronic safety-related systems. This standard goes into some depth on how to determine failure rates and how to calculate the “Safe Failure Fraction,” a number which is related to DC but is not the same.

For everyone else, the good news is that you can use the table in Annex E to estimate the DC%. It’s worth noting here that Annex E is “Informative.” In standards-speak, this means that the information in the annex is not part of the “normative” text, which means that it is simply information to help you use the normative part of the standard. The design must conform to the requirements in the normative text if you want to claim conformity to the standard. The fact that [1, Annex E] is informative gives you the option to calculate the DC% value rather than selecting it from Table E.1. Using the calculated value would not violate the requirements in the normative text.

If you are using IFA SISTEMA [16] to do the calculations for you, you will find that the software limits you to selecting a single DC measure from Table E.1, and this principle applies if you are doing the calculations by hand too. Only one item from Table E.1 can be selected for a given safety function.

Ranking DC

Once you have determined the DC for a safety function, you need to compare the DC value against [1, Table 5] to see if the DC is sufficient for the PLr you are trying to achieve. Table 5 bins the DC results into four ranges. Just like binning the PFHd values into five ranges helps to prevent precision bias in estimating the probability of failure of the complete system or safety function, the ranges in Table 5 helps to prevent precision bias in the calculated or selected DC values.

ISO 13849-1, Table 5 Diagnostic coverage (DC)
ISO 13849-1, Table 5 Diagnostic coverage (DC)

If the DC value was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add additional diagnostic features so that you can either select a higher coverage from [1, Table E.1] or calculate a higher value using [14].

Multiple safety functions

When you have multiple safety functions that make up a complete safety system, for example, an emergency stop function and a guard interlocking function, the DC values need to be averaged to determine the overall DC for the complete system. [1, Annex E] provides you with a method to do this in Equation E.1.

Equation for averaging the DC values of multiple safety functions
ISO 13849-1-2015 Equation E.1

Plug in the values for MTTFD and DC for each safety function, and calculate the resulting DCavg value for the complete system.

That’s it for this article. The next part will cover Common Cause Failures (CCF). Look for it on 20-Mar-17!

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook, 3rd Ed. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3] Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

[0.4] “Code of practice for electromagnetic resilience, 1st ed. Stevenage, UK: IET Standards TC4.3 EMC, 2017.

[0.5] “Code of Practice: Competence for Safety Related Systems Practitioners, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2016.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[7]     Functional safety of electrical/electronic/programmable electronic safety-related systems. 7 parts. IEC Standard 61508. Edition 2. 2010.

[14]   Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 2: Requirements for electrical/electronic/programmable electronic safety-related systems. IEC Standard 61508-2. 2010.

[15]     Reliability Prediction of Electronic Equipment. Military Handbook MIL-HDBK-217F. 1991.

[16]     “IFA – Practical aids: Software-Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv.de, 2017. [Online]. Available: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].

4 thoughts on “ISO 13849-1 Analysis — Part 5: Diagnostic Coverage (DC)

  1. With 3 or more emergency stops in series, does the DCavg = 0% (none) since there is now the potential for fault masking? So does that automatically take my category3 architecture down to category2 since my DC=none?

    btw, AMAZING information you have digested for all of us to use. this has been incredibly helpful.

    1. Thanks, Alex! I really appreciate hearing that you’ve found my articles helpful.

      To answer your question, we have to turn to ISO/TR 24119 on fault masking. The method for assessing fault-masking has, so far, only been covered in that technical report for series-connected electromechanical interlocking switches, however, the consensus is that the same principles apply to e-stop devices as well. BTW, the contents of ISO/TR 24119 are being incorporated into the next edition of ISO 14119, planned for publication in Q4 this year, or possibly as late as the end of Q2 next year.

      I’ll point you to Table 1 in the TR, which is the Simplified Method. If you have more than two series-connected devices, with any number of additional series-connected devices, then DC falls to zero. This does not mean that your structure category falls to Category 2, but it does mean that your design cannot meet Category 3. If this sounds confusing, remember that the Categories don’t represent a continuum, but are just a way to easily identify five different structures that have been analyzed and characterized by the technical committee. This is why failing to meet all of the requirements in Cat. 3 doesn’t provide you with a “fall-back” to Cat. 2. What it does mean is that using the ISO 13849 rules, you cannot predict the ability of the system to withstand faults, and therefore you cannot predict the PL. The fact that you have redundant channels means that some kinds of faults will be tolerated, but you are unlikely to know that they have occurred since DC=0.

      If you use the detailed method for determining DC found in clause 6.3, you may find that you can develop a higher DC, perhaps as high as DC=medium. You’ll have to go through your design using the clause 6.3 methods, starting with the topology of the connected devices, and looking at the characteristics of the controls to which the input devices are connected. It’s more complicated, but may be worthwhile for you.

      The solution is relatively simple – don’t daisy-chain electromechanical input devices. You can have multiple e-stop or interlocking devices connected via a safe network topology, since every device has its own diagnostics built-in, and the network is not subject to fault masking like an analog system.

      1. Hello Doug, thank you for all the information you are sharing with us it is really appreciated.

        With 3 or more emergency stops in series BUT connected to a pulse test output does the DCavg = 0% (none) since there is now the potential for faults masking?

        In this case I am referring to a 1734-IB8S Digital Input from Rockwell. So first channel of N.C contact connected to T0 and I0 and second to T1 and I1. From BGIA table E.2 I think it can fall under – Plausibility check/readback/(cross-)monitoring – With dynamic test, with high quality fault detection with a DC AVG at 99% – but I am not sure by comparing to example 29 of BGIA report (page 206) where they have a DC ag is at 90%.

        Thanks,

        1. Hi Samuel!
          First, thanks for buying me three coffees! Much appreciated!

          In your question you refer to the “BGIA report”, however, this is insufficient as a reference. If you mean BGIA Report 2/2008e, it has been superseded by BGIA Report 2/2017e. If you can cite the report more closely, I might be able to comment.

          WRT your question, the answer is found in ISO 14119:2021, Annex K. Figure K.4 shows three interlocking devices in series connection, although this could just as easily be three e-stop devices with redundant contacts. If you use the simplified approach shown in Table K.1, you will see that if you have a situation where none of the interconnected devices are used “frequently”, that is, more than once per hour, and you have 2-4 devices in series, then the best you can do for DCavg is MEDIUM, and you will have a Fault Masking probability level of “1”, meaning that fault masking is foreseeable, but it is at the lowest probability. Looking up DCavg=MEDIUM in ISO 13849-1:2015 Table 5, you will find that MEDIUM = 90% ? DC < 99%, so you still need to do the rest of the FS analysis to determine what DCavg your design can achieve. If you want a more detailed analysis, you can follow Annex K.4.3 Regular method for the determination of the maximum achievable DC. Table K.3 links the wiring topology and switch arrangement, to diagnostic testing and finally to DC. At some point, I am expecting that the Annex K material will be reflected in a future edition of ISO 13849-1, and possibly in ISO 13850, but for now, ISO 14119 is where you can find it. Also, be aware that there is a new edition of ISO 13849-1 hovering on the edge of publication at the moment. It has progressed to the point where it is ready for public review but has been held up due to issues with getting it reviewed by the CEN machinery consultants before public review can start. The CEN review is required because the document has been developed in ISO under the Vienna Agreement which permits the development of EN standards in ISO. Part of the Agreement is the requirement that CEN has the document reviewed for consistency with the Machinery Directive (2006/42/EC), and until they do that the document cannot proceed from the Committee Draft stage to the Draft International Standard stage, at which public review occurs. Anyway, watch the blog for an announcement of the start of Public Review, as I will announce it as soon as I hear the starting date.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.