## ISO 13849-1 Analysis — Part 5: Diagnostic Coverage (DC)

This entry is part 5 of 6 in the series How to do a 13849-1 analysis

# What is Diagnostic Coverage?

Understanding Diagnostic Coverage (DC) as it is used in ISO 13849-1 [1] is critical to analysing the design of any safety function assessed using this standard. In case you missed a previous part of the series, you can read it here.

In the last instalment of this series discussing MTTFD, I brought up the fact that everything fails eventually, and so everything has a natural failure rate. The bathtub curve shown at the top of this post shows a typical failure rate curve for most products. Failure rates tell you the average time (or sometimes the mean time) it takes for components or systems to fail. Failure rates are expressed in many ways, MTTFD and PFHd being the ways relevant to this discussion of ISO 13849 analysis. MTTFis given in years, and PFHd is given in fractional hours (1/h). As a reminder, PFHd stands for “Probability of dangerous Failure per Hour”.

Three of the standard architectures include automatic diagnostic functions, Categories 2, 3 and 4. As soon as we add diagnostics to the system, we need to know what faults the diagnostics can detect and how many of the dangerous failures relative to the total number of failures that represents. Diagnostic Coverage (DC) represents the ratio of dangerous failures that can be detected to the total dangerous failures that could occur, expressed as a percentage. There will be some failures that do not result in a dangerous failure, and those failures are excluded from DC because we don’t need to worry about them – if they occur, the system will not fail into a dangerous state.

Here’s the formal definition from [1]:

3.1.26 diagnostic coverage (DC)

measure of the effectiveness of diagnostics, which may be determined as the ratio between the failure rate of detected dangerous failures and the failure rate of total dangerous failures

Note 1 to entry: Diagnostic coverage can exist for the whole or parts of a safety-related system. For example, diagnostic coverage could exist for sensors and/or logic system and/or final elements. [SOURCE: IEC 61508-4:1998, 3.8.6, modified.]

That brings up two other related definitions that need to be kept in mind [1]:

3.1.4 failure

termination of the ability of an item to perform a required function

Note 1 to entry: After a failure, the item has a fault.

Note 2 to entry: “Failure” is an event, as distinguished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items consisting of software only.

Note 4 to entry: Failures which only affect the availability of the process under control are outside of the scope of this part of ISO 13849. [SOURCE: IEC 60050–191:1990, 04-01.]

and the most important one [1]:

3.1.5 dangerous failure

failure which has the potential to put the SRP/CS in a hazardous or fail-to-function state

Note 1 to entry: Whether or not the potential is realized can depend on the channel architecture of the system; in redundant systems a dangerous hardware failure is less likely to lead to the overall dangerous or fail-to- function state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, modified.]

Just as a reminder, SRP/CS stands for “safety-related parts of control systems”.

## Failure Math

### Failure Rate Data Sources

To do any calculations, we need data, and this is true for failure rates as well. ISO 13849-1 provides some tables in the annexes that list some common types of components and their associated failure rates, and there are more failure rate tables in ISO 13849-2. A word of caution here: Do not mix sources of failure rate data, as the conditions under which that data is true won’t match the data in ISO 13849. There are a few good sources of failure rate data out there, for example, MIL-HDBK-217, Reliability Prediction of Electronic Equipment [15], as well as the database maintained by Exida. In any case, use a single source for your failure rate data.

### Failure Rate Variables

IEC 61508 [7] defines a number of variables related to failure rates. The lowercase Greek letter lambda, $\lambda$, is used to denote failures.

The common variable designations used are:

$\lambda$ = failures
$\lambda_{(t)}$= failure rate
$\lambda_s$ = “safe” failures
$\lambda_d$ = “dangerous” failures
$\lambda_{dd}$ = detectable “dangerous” failures
$\lambda_{du}$ = undetectable “dangerous” failures

### Calculating DC

Of these variables, we only need to concern ourselves with $\lambda_d$, $\lambda_{dd}$ and $\lambda_{du}$. To understand how these variables are used, we can express their relationship as

$\lambda_d=\lambda_{dd}+\lambda_{du}$

Following on that idea, the Diagnostic Coverage can be expressed as a percentage like this:

$DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100$

## Determining DC%

If you want to actually calculate DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hardcore types to IEC 61508-2, Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 2: Requirements for electrical/electronic/programmable electronic safety-related systems. This standard goes into some depth on how to determine failure rates and how to calculate the “Safe Failure Fraction,” a number which is related to DC but is not the same.

For everyone else, the good news is that you can use the table in Annex E to estimate the DC%. It’s worth noting here that Annex E is “Informative.” In standards-speak, this means that the information in the annex is not part of the “normative” text, which means that it is simply information to help you use the normative part of the standard. The design must conform to the requirements in the normative text if you want to claim conformity to the standard. The fact that [1, Annex E] is informative gives you the option to calculate the DC% value rather than selecting it from Table E.1. Using the calculated value would not violate the requirements in the normative text.

If you are using IFA SISTEMA [16] to do the calculations for you, you will find that the software limits you to selecting a single DC measure from Table E.1, and this principle applies if you are doing the calculations by hand too. Only one item from Table E.1 can be selected for a given safety function.

## Ranking DC

Once you have determined the DC for a safety function, you need to compare the DC value against [1, Table 5] to see if the DC is sufficient for the PLr you are trying to achieve. Table 5 bins the DC results into four ranges. Just like binning the PFHd values into five ranges helps to prevent precision bias in estimating the probability of failure of the complete system or safety function, the ranges in Table 5 helps to prevent precision bias in the calculated or selected DC values.

If the DC value was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add additional diagnostic features so that you can either select a higher coverage from [1, Table E.1] or calculate a higher value using [14].

## Multiple safety functions

When you have multiple safety functions that make up a complete safety system, for example, an emergency stop function and a guard interlocking function, the DC values need to be averaged to determine the overall DC for the complete system. [1, Annex E] provides you with a method to do this in Equation E.1.

Plug in the values for MTTFD and DC for each safety function, and calculate the resulting DCavg value for the complete system.

That’s it for this article. The next part will cover Common Cause Failures (CCF). Look for it on 20-Mar-17!

In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find helpful on this journey:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.

[16]     “IFA – Practical aids: Software-Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv.de, 2017. [Online]. Available: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].

## ISO 13849 Analysis — Part 2: Safety Requirement Specification

This entry is part 2 of 6 in the series How to do a 13849-1 analysis

## Developing the Safety Requirement Specification

The Safety Requirement Specification sounds pretty heavy, but actually, it is just a big name for a way to organise the information you need to have to analyse and design the safety systems for your machinery. Note that I am assuming that you are doing this in the “right” order, meaning that you are planning the design beforehand, rather than trying to back-fill the documentation after completing the design. In either case, the process is the same, but getting the information you need can be much harder after the fact, than before the doing the design work. Doing some aspects in a review mode is impossible, especially if a third party to whom you have no access did the design work [8].

If you missed the first instalment in this series, you can read it here.

## What goes into a Safety Requirements Specification?

For reference, chapter 5 of ISO 13849-1 [1] covers safety requirement specifications to some degree, but it needs some clarification I think. First of all, what is a safety function?

Safety functions include any function of the machine that has a direct protective effect for the worker using the machinery. However, using this definition, it is possible to ignore some important functions. Complementary protective measures, like emergency stop, can be missed because they are usually “after the fact”, i.e., the injury occurs, and then the E-stop is pressed, so you cannot say that it has a “direct protective effect”. If we look at the definitions in [1], we find:

3.1.20

safety function

function of the machine whose failure can result in an immediate increase of the risk(s)
[SOURCE: ISO 12100:2010, 3.30.]

## Linking Risk to Functional Safety

Referring to the risk assessment, any risk control that protects workers from some aspect of the machine operation using a control function like an interlocked gate, or by maintaining a temperature below a critical level or speed at a safe level, is a safety function. For example: if the temperature in a process rises too high, the process will explode; or if a shaft speed is too high (or too low) the tool may shatter and eject broken pieces at high speed. Therefore, the temperature control function and the speed control function are safety functions. These functions may also be process control functions, but the potential for an immediate increase in risk due to a failure is what makes these functions safety functions no matter what else they may do.

[1, Table 8] gives you some examples of various kinds of safety functions found on machines. The table is not inclusive – meaning there are many more safety functions out there than are listed in the table. Your job is to figure out which ones live in your machine. It is a bit like Pokemon – ya gotta catch ’em all!

## Basic Safety Requirement Specification

Each safety function must have a Performance Level or a Safety Integrity Level assigned as part of the risk assessment. For each safety function, you need to develop the following information:

Basic Safety Requirement Specification
Item Description
Safety Function Identification Name or other references, e.g. “Access Gate Interlock” or “Hazard Zone 2.”
Functional Characteristics
• Intended use or foreseeable misuse of the machine relevant to the safety function
• Operating modes relevant to the safety function
• Cycle time of the machine
• Response time of the safety function
Emergency Operation Is this an emergency operation function? If yes, what types of emergencies might be mitigated by this function?
Interactions What operating modes require this function to be operational? Are there modes where this function requires deliberate bypass? These could include normal working modes (automatic, manual, set-up, changeover), and fault-finding or maintenance modes.
Behaviour How you want the system to behave when the safety function is triggered, i.e., Power is immediately removed from the MIG welder using an IEC 60204-1 Category 0 stop function, and robot motions are stopped using IEC 60204-1 Category 1 stop function through the robot safety stop input.

or

All horizontal pneumatic motions stop in their current positions. Vertical motions return to the raised or retracted positions.

Also to be considered is a power loss condition. Should the system behave in the same way as if the safety function was triggered, not react at all, or do something else? Consider vertical axes that might require holding brakes or other mechanisms to prevent power loss causing unexpected motion.

Machine State after triggering What is the expected state of the machine after triggering the safety function? What is the recovery process?
Frequency of Operation How often do you expect this safety function to be used? A reasonable estimate is needed. More on this below.
Priority of Operation If simultaneous triggering of multiple safety functions is possible, which function(s) takes precedence? E.g., Emergency Stop always takes precedence over everything else. What happens if you have a safe speed function and a guard interlock that are associated because the interlock is part of a guarding function covering a shaft, and you need to troubleshoot the safe speed function, so you need access to the shaft where the encoders are mounted?
Required Performance Level I suggest recording the S, F, and P values selected as well as the PLr value selected for later reference.

Here’s an example table in MS Word format that you can use as a starting point for your SRS documents. Note that SRS can be much more detailed than this. If you want more information on this, read IEC 61508-1, 7.10.2.

So, that is the minimum. You can add lots more information to the minimum requirements, but this will get you started. If you want more information on developing the SRS, you will need to get a copy of IEC 61508 [7].

## What’s Next?

Next, you need to be able to make some design decisions about system architecture and components. Circuit architectures have been discussed at some length on the MS101 blog in the past, so I am not going to go through them again in this series. Instead, I will show you how to choose an architecture based on your design goals in the next instalment. In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find helpful on this journey:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.

## ISO 13849 Analysis — Part 1: Start with Risk Assessment

This entry is part 1 of 6 in the series How to do a 13849-1 analysis

I often get questions from clients about how to get started on Functional Safety using ISO 13849. This article is the first in a series that will walk you through the basics of using ISO 13849. Keep in mind that you will need to hold a copy of the 3rd edition of ISO 13849-1 [1] and the 2nd edition of ISO 13849-2 [2] to use as you go along. There are other standards which you may also find useful, and I have included them in the Reference section at the end of the article. Each post has a Reference List. I will publish a complete reference list for the series with the last post.

## Where to start?

So you have just learned that you need to do an ISO 13849 functional safety analysis. You have the two parts of the standard, and you have skimmed them, but you are feeling a bit overwhelmed and unsure of where to start. By the end of this article, you should be feeling more confident about how to get this job done.

## Step 1 – Risk Assessment

For the purpose of this article, I am going to assume that you have a risk assessment for the machinery, and you have a copy for reference. If you do not have a risk assessment, stop here and get that done. There are several good references for that, including ISO 12100 [3], CSA Z432 [4], and ANSI B11.TR3 [5]. You can also have a look at my series on Risk Assessment.

The risk assessment should identify which risks require mitigation using the control system, e.g., use of an interlocked gate, a light curtain, a two-hand control, an enabling device, etc.See the MS101 glossary for detailed definitions. Each of these becomes a safety function. Each safety function requires a safety requirements specification (SRS), which I will describe in more detail a bit later.

## Safety Functions

The 3rd edition of ISO 13849 [1] provides two tables that give some examples of safety function characteristics [1, Table 8] and parameters [1, Table 9] and also provides references to corresponding standards that will help you to define the necessary parameters. These tables should not be considered to be exhaustive – there is no way to list every possible safety function in a table like this. The tables will give you some good ideas about what you are looking for in machine control functions that will make them safety functions.

While you are identifying risk reduction measures that will use the control system for mitigation, don’t forget that complementary protective measures like emergency stop, enabling devices, etc. all need to be included. Some of these functions may have minimum requirements set by Type B2 standards, like ISO 13850 [6] for emergency stop which sets the minimum performance level for this function at PLc.

## Selecting the Required Performance Level

ISO 13849-1:2015 provides a graphical means for selecting the minimum Performance Level (PL) required for the safety function based on the risk assessment. A word of caution here: you may feel like you are re-assessing the risk using this tool because it does use risk parameters (severity, frequency/duration of exposure and possibility to avoid/limit harm) to determine the PL. Risk assessment This tool is not a risk assessment tool, and using it that way is a fundamental mistake. Its output is in terms of performance level, which is failure rate per hour of operation. For example, it is entirely incorrect to say, “This machine has a risk level of PLc” since we define PLs in terms of probable failure rate per hour.

Once you have assigned a required Performance Level (PLr) to each safety function, you can move on to the next step: Developing the Safety Requirements Specification.

## Book List

Here are some books that I think you may find helpful on this journey:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.