ISO 13849-1 Analysis — Part 5: Diagnostic Coverage (DC)

This entry is part 5 of 6 in the series How to do a 13849-1 analysis

What is Diagnostic Coverage?

Understanding Diagnostic Coverage (DC) as it is used in ISO 13849-1 [1] is critical to analysing the design of any safety function assessed using this standard. In case you missed a previous part of the series, you can read it here.

In the last instalment of this series discussing MTTFD, I brought up the fact that everything fails eventually, and so everything has a natural failure rate. The bathtub curve shown at the top of this post shows a typical failure rate curve for most products. Failure rates tell you the average time (or sometimes the mean time) it takes for components or systems to fail. Failure rates are expressed in many ways, MTTFD and PFHd being the ways relevant to this discussion of ISO 13849 analysis. MTTFis given in years, and PFHd is given in fractional hours (1/h). As a reminder, PFHd stands for “Probability of dangerous Failure per Hour”.

Three of the standard architectures include automatic diagnostic functions, Categories 2, 3 and 4. As soon as we add diagnostics to the system, we need to know what faults the diagnostics can detect and how many of the dangerous failures relative to the total number of failures that represents. Diagnostic Coverage (DC) represents the ratio of dangerous failures that can be detected to the total dangerous failures that could occur, expressed as a percentage. There will be some failures that do not result in a dangerous failure, and those failures are excluded from DC because we don’t need to worry about them – if they occur, the system will not fail into a dangerous state.

Here’s the formal definition from [1]:

3.1.26 diagnostic coverage (DC)

measure of the effectiveness of diagnostics, which may be determined as the ratio between the failure rate of detected dangerous failures and the failure rate of total dangerous failures

Note 1 to entry: Diagnostic coverage can exist for the whole or parts of a safety-related system. For example, diagnostic coverage could exist for sensors and/or logic system and/or final elements. [SOURCE: IEC 61508-4:1998, 3.8.6, modified.]

That brings up two other related definitions that need to be kept in mind [1]:

3.1.4 failure

termination of the ability of an item to perform a required function

Note 1 to entry: After a failure, the item has a fault.

Note 2 to entry: “Failure” is an event, as distinguished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items consisting of software only.

Note 4 to entry: Failures which only affect the availability of the process under control are outside of the scope of this part of ISO 13849. [SOURCE: IEC 60050–191:1990, 04-01.]

and the most important one [1]:

3.1.5 dangerous failure

failure which has the potential to put the SRP/CS in a hazardous or fail-to-function state

Note 1 to entry: Whether or not the potential is realized can depend on the channel architecture of the system; in redundant systems a dangerous hardware failure is less likely to lead to the overall dangerous or fail-to- function state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, modified.]

Just as a reminder, SRP/CS stands for “safety-related parts of control systems”.

Failure Math

Failure Rate Data Sources

To do any calculations, we need data, and this is true for failure rates as well. ISO 13849-1 provides some tables in the annexes that list some common types of components and their associated failure rates, and there are more failure rate tables in ISO 13849-2. A word of caution here: Do not mix sources of failure rate data, as the conditions under which that data is true won’t match the data in ISO 13849. There are a few good sources of failure rate data out there, for example, MIL-HDBK-217, Reliability Prediction of Electronic Equipment [15], as well as the database maintained by Exida. In any case, use a single source for your failure rate data.

Failure Rate Variables

IEC 61508 [7] defines a number of variables related to failure rates. The lowercase Greek letter lambda, \lambda, is used to denote failures.

The common variable designations used are:

\lambda = failures
\lambda_{(t)} = failure rate
\lambda_s = “safe” failures
\lambda_d = “dangerous” failures
\lambda_{dd} = detectable “dangerous” failures
\lambda_{du} = undetectable “dangerous” failures

Calculating DC

Of these variables, we only need to concern ourselves with \lambda_d, \lambda_{dd} and \lambda_{du}. To understand how these variables are used, we can express their relationship as

\lambda_d=\lambda_{dd}+\lambda_{du}

Following on that idea, the Diagnostic Coverage can be expressed as a percentage like this:

DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100

Determining DC%

If you want to actually calculate DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hardcore types to IEC 61508-2, Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 2: Requirements for electrical/electronic/programmable electronic safety-related systems. This standard goes into some depth on how to determine failure rates and how to calculate the “Safe Failure Fraction,” a number which is related to DC but is not the same.

For everyone else, the good news is that you can use the table in Annex E to estimate the DC%. It’s worth noting here that Annex E is “Informative.” In standards-speak, this means that the information in the annex is not part of the “normative” text, which means that it is simply information to help you use the normative part of the standard. The design must conform to the requirements in the normative text if you want to claim conformity to the standard. The fact that [1, Annex E] is informative gives you the option to calculate the DC% value rather than selecting it from Table E.1. Using the calculated value would not violate the requirements in the normative text.

If you are using IFA SISTEMA [16] to do the calculations for you, you will find that the software limits you to selecting a single DC measure from Table E.1, and this principle applies if you are doing the calculations by hand too. Only one item from Table E.1 can be selected for a given safety function.

Ranking DC

Once you have determined the DC for a safety function, you need to compare the DC value against [1, Table 5] to see if the DC is sufficient for the PLr you are trying to achieve. Table 5 bins the DC results into four ranges. Just like binning the PFHd values into five ranges helps to prevent precision bias in estimating the probability of failure of the complete system or safety function, the ranges in Table 5 helps to prevent precision bias in the calculated or selected DC values.

ISO 13849-1, Table 5 Diagnostic coverage (DC)
ISO 13849-1, Table 5 Diagnostic coverage (DC)

If the DC value was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add additional diagnostic features so that you can either select a higher coverage from [1, Table E.1] or calculate a higher value using [14].

Multiple safety functions

When you have multiple safety functions that make up a complete safety system, for example, an emergency stop function and a guard interlocking function, the DC values need to be averaged to determine the overall DC for the complete system. [1, Annex E] provides you with a method to do this in Equation E.1.

Equation for averaging the DC values of multiple safety functions
ISO 13849-1-2015 Equation E.1

Plug in the values for MTTFD and DC for each safety function, and calculate the resulting DCavg value for the complete system.

That’s it for this article. The next part will cover Common Cause Failures (CCF). Look for it on 20-Mar-17!

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook, 3rd Ed. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[7]     Functional safety of electrical/electronic/programmable electronic safety-related systems. 7 parts. IEC Standard 61508. Edition 2. 2010.

[14]   Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 2: Requirements for electrical/electronic/programmable electronic safety-related systems. IEC Standard 61508-2. 2010.

[15]     Reliability Prediction of Electronic Equipment. Military Handbook MIL-HDBK-217F. 1991.

[16]     “IFA – Practical aids: Software-Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv.de, 2017. [Online]. Available: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].