## ISO 13849 – 1 Analysis — Part 7: Safety-​Related Software

This entry is part 7 of 7 in the series How to do a 13849 – 1 ana­lys­is

# Safety-​Related Software

Up to this point, I have been dis­cuss­ing the basic pro­cesses used for the design of safety-​related parts of con­trol sys­tems. The under­ly­ing assump­tion is that these tech­niques apply to the design of hard­ware used for safety pur­poses. The remain­ing ques­tion focuses on the design and devel­op­ment of safety-​related soft­ware that runs on that hard­ware. If you have not read the rest of this series and would like to catch up first, you can find it here.

In this dis­cus­sion of safety-​related soft­ware, keep in mind that I am talk­ing about soft­ware that is only inten­ded to reduce risk. Some plat­forms that are not well suited for safety soft­ware, primar­ily com­mon off-​the-​shelf (COTS) oper­at­ing sys­tems like Windows, MacOS and Linux. Generally speak­ing, these oper­at­ing sys­tems are too com­plex and sub­ject to unanti­cip­ated changes to be suit­able for high-​reliability applic­a­tions. There is noth­ing wrong with using these sys­tems for annun­ci­ation and mon­it­or­ing func­tions, but the safety func­tions should run on more pre­dict­able plat­forms.

The meth­od­o­logy dis­cussed in ISO 13849 – 1 is usable up to PLd. At the end of the Scope we find Note 4:

NOTE 4 For safety-​related embed­ded soft­ware for com­pon­ents with PLr = e, see IEC 61508 – 3:1998, Clause 7.

As you can see, for very high-​reliability sys­tems, i.e., PLe/​SIL3 or SIL4, it is neces­sary to move to IEC 61508. The meth­ods dis­cussed here are based on ISO 13849 – 1:2015, Chapter 4.6.

# Goals

There are two goals for safety-​related soft­ware devel­op­ment activ­it­ies:

1. Avoid faults
2. Generate read­able, under­stand­able, test­able and main­tain­able soft­ware

## Avoiding Faults

Fig. 1 [1, Fig. 6] shows the “V-​model” for soft­ware devel­op­ment. This approach to soft­ware design incor­por­ates both val­id­a­tion and veri­fic­a­tion, and when cor­rectly imple­men­ted will res­ult in soft­ware that meets the design spe­cific­a­tions.

If you aren’t sure what the dif­fer­ence is between veri­fic­a­tion and val­id­a­tion, I remem­ber it is this way: Validation means “Are we build­ing the right thing?”, and veri­fic­a­tion means “Did we build the thing right?” The whole pro­cess hinges on the Safety Requirement Specification (SRS), so fail­ing to get that part of the pro­cess right in the begin­ning will neg­at­ively impact both hard­ware and soft­ware design. The SRS is the yard­stick used to decide if you built the right thing. Without that, you are clue­less about what you are build­ing.

Coming in from the Safety Requirement Specification (also called the safety func­tion spe­cific­a­tion), each step in the pro­cess is shown. The dashed lines illus­trate the veri­fic­a­tion pro­cess at each step. Notice that the actu­al cod­ing step is at the bot­tom of the V-​model. Everything above the cod­ing stage is either plan­ning and design, or qual­ity assur­ance activ­it­ies.

There are oth­er meth­ods that can be used to res­ult in veri­fied and val­id­ated soft­ware, so if you have a QA pro­cess that pro­duces sol­id res­ults, you may not need to change it. I would recom­mend that you review all the stages in the V-​model to ensure that your QA sys­tem has sim­il­ar pro­cesses.

To make set­ting up safety sys­tems sim­pler for design­ers and integ­rat­ors, there are two approaches to soft­ware design that can be used.

## Two Approaches to Software Design

There are two approaches to soft­ware design that should be con­sidered:

• Preconfigured (building-​block style) soft­ware
• Fully cus­tom­ised soft­ware

### Preconfigured Building-​Block Software

The pre­con­figured building-​block approach is typ­ic­ally used for con­fig­ur­ing safety PLCs or pro­gram­mable safety relays or mod­ules. This type of soft­ware is referred to as “safety-​related embed­ded soft­ware (SRESW)” in [1].

Pre-​written func­tion blocks are provided by the device man­u­fac­turer. Each func­tion block has a par­tic­u­lar role: emer­gency stop, safety gate input, zero-​speed detec­tion, and so on. When con­fig­ur­ing a safety PLC or safety mod­ules that use this approach, the design­er selects the appro­pri­ate block and then con­fig­ures the inputs, out­puts, and any oth­er func­tion­al char­ac­ter­ist­ics that are needed. The design­er has no access to the safety-​related code, so apart from con­fig­ur­a­tion errors, no oth­er errors can be intro­duced. The func­tion blocks are veri­fied and val­id­ated (V & V) by the con­trols com­pon­ent man­u­fac­turer, usu­ally with the sup­port of an accred­ited cer­ti­fic­a­tion body. The func­tion blocks will nor­mally have a PL asso­ci­ated with them, and a state­ment like “suit­able for PLe” will be made in the func­tion block descrip­tion.

This approach elim­in­ates the need to do a detailed V & V of the code by the design­ing entity (i.e., the machine build­er). However, the machine build­er is still required to do a V & V on the oper­a­tion of the sys­tem as they have con­figured it. The machine V & V includes all the usu­al fault injec­tion tests and func­tion­al tests to ensure that the sys­tem will behave in as inten­ded in the pres­ence of a demand on the safety func­tion or a fault con­di­tion. The faults that should be tested are those in your Fault List. If you don’t have a fault list or don’t know what a Fault List is, see Part 8 in this series.

Using pre-​configured build­ing blocks achieves the first goal, fault avoid­ance, at least as far as the soft­ware cod­ing is con­cerned. The con­fig­ur­a­tion soft­ware will val­id­ate the func­tion block con­fig­ur­a­tions before com­pil­ing the soft­ware for upload to the safety con­trol­ler so that most con­fig­ur­a­tion errors will be caught at that stage.

This approach also facil­it­ates the second goal, as long as the con­fig­ur­a­tion soft­ware is usable and main­tained by the soft­ware vendor. The con­fig­ur­a­tion soft­ware usu­ally includes the abil­ity to annot­ate the con­fig­ur­a­tions with rel­ev­ant details to assist with the read­ab­il­ity and under­stand­ab­il­ity of the soft­ware.

### Fully Customised Software

This approach is used where a fully cus­tom­ised hard­ware plat­form is being used, and the safety soft­ware is designed to run on that plat­form. [1] refers to this type of soft­ware as “Safety-​related applic­a­tion soft­ware (SRASW).” A fully cus­tom­ised soft­ware applic­a­tion is used where a very spe­cial­ised safety sys­tem is con­tem­plated, and FPGAs or oth­er cus­tom­ised hard­ware is being used. These sys­tems are usu­ally pro­grammed using full-​variability lan­guages.

In this case, the full hard­ware and soft­ware V & V approach must be employed. In my opin­ion, ISO 13849 – 1 is prob­ably not the best choice for this approach due to its sim­pli­fic­a­tion, and I would usu­ally recom­mend using IEC 61508 – 3 as the basis for the design, veri­fic­a­tion, and val­id­a­tion of fully cus­tom­ised soft­ware.

# Process requirements

## Safety-​Related Embedded Software (SRESW)

[1, 4.6.2] provides a laun­dry list of ele­ments that must be incor­por­ated into the V-​model pro­cesses when devel­op­ing SRESW, broken down by PLa through PLd, and then some addi­tion­al require­ments for PLc and PLd.

If you are design­ing SRESW for PLe, [1, 4.6.2] points you dir­ectly to IEC 61508 – 3, clause 7, which cov­ers soft­ware suit­able for SIL3 applic­a­tions.

## Safety-​Related Application Software (SRASW)

[1, 4.6.3] provides a list of require­ments that must be met through the v-​model pro­cess for SRASW, and allows that PLa through PLe can be met by code writ­ten in LVL and that PLe applic­a­tions can also be designed using FVL. In cases where soft­ware is developed using  FVL, the soft­ware can be treated as the embed­ded soft­ware products (SRESW) are handled.

A sim­il­ar archi­tec­tur­al mod­el to that used for single-​channel hard­ware devel­op­ment is used, as shown in Fig. 2  [1, Fig 7].

The com­plete V-​model must be applied to safety-​related applic­a­tion soft­ware, with all of the addi­tion­al require­ments from [1, 4.6.3] included in the pro­cess mod­el.

# Conclusions

There is a lot to safety-​related soft­ware devel­op­ment, cer­tainly much more than could be dis­cussed in a blog post like this or even in a stand­ard like ISO 13849 – 1. If you are con­tem­plat­ing devel­op­ing safety related soft­ware and you are not famil­i­ar with the tech­niques needed to devel­op this kind of high-​reliability soft­ware, I would sug­gest you get help from a qual­i­fied developer. Keep in mind that there can be sig­ni­fic­ant liab­il­ity attached to safety sys­tem fail­ures, includ­ing the deaths of people using your product. If you are devel­op­ing SRASW, I would also recom­mend fol­low­ing IEC 61508 – 3 as the basis for the devel­op­ment and related QA pro­cesses.

# Definitions

3.1.36 applic­a­tion soft­ware
soft­ware spe­cif­ic to the applic­a­tion, imple­men­ted by the machine man­u­fac­turer, and gen­er­ally con­tain­ing logic sequences, lim­its and expres­sions that con­trol the appro­pri­ate inputs, out­puts, cal­cu­la­tions and decisions neces­sary to meet the SRP/​CS require­ments 3.1.37 embed­ded soft­ware firm­ware sys­tem soft­ware soft­ware that is part of the sys­tem sup­plied by the con­trol man­u­fac­turer and which is not access­ible for modi­fic­a­tion by the user of the machinery Note 1 to entry: Embedded soft­ware is usu­ally writ­ten in FVL.
Note 1 to entry: Embedded soft­ware is usu­ally writ­ten in FVL.
3.1.34 lim­ited vari­ab­il­ity lan­guage LVL
type of lan­guage that provides the cap­ab­il­ity of com­bin­ing pre­defined, application-​specific lib­rary func­tions to imple­ment the safety require­ments spe­cific­a­tions
Note 1 to entry: Typical examples of LVL (lad­der logic, func­tion block dia­gram) are giv­en in IEC 61131 – 3.
Note 2 to entry: A typ­ic­al example of a sys­tem using LVL: PLC. [SOURCE: IEC 61511 – 1:2003, 3.2.80.1.2, mod­i­fied.]
3.1.35 full vari­ab­il­ity lan­guage FVL
type of lan­guage that provides the cap­ab­il­ity of imple­ment­ing a wide vari­ety of func­tions and applic­a­tions EXAMPLE C, C++, Assembler.
Note 1 to entry: A typ­ic­al example of sys­tems using FVL: embed­ded sys­tems.
Note 2 to entry: In the field of machinery, FVL is found in embed­ded soft­ware and rarely in applic­a­tion soft­ware. [SOURCE: IEC 61511 – 1:2003, 3.2.80.1.3, mod­i­fied.]
3.1.37 embed­ded soft­ware
firm­ware
sys­tem soft­ware
soft­ware that is part of the sys­tem sup­plied by the con­trol man­u­fac­turer and which is not access­ible for modi­fic­a­tion by the user of the machinery.
Note 1 to entry: Embedded soft­ware is usu­ally writ­ten in FVL.
Field Programmable Gate Array FPGA
A field-​programmable gate array (FPGA) is an integ­rated cir­cuit designed to be con­figured by a cus­tom­er or a design­er after man­u­fac­tur­ing – hence “field-​programmable”. The FPGA con­fig­ur­a­tion is gen­er­ally spe­cified using a hard­ware descrip­tion lan­guage (HDL), sim­il­ar to that used for an application-​specific integ­rated cir­cuit (ASIC). [22]

# Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[16]     “IFA – Practical aids: Software-​Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv​.de, 2017. [Online]. Available: http://​www​.dguv​.de/​i​f​a​/​p​r​a​x​i​s​h​i​l​f​e​n​/​p​r​a​c​t​i​c​a​l​-​s​o​l​u​t​i​o​n​s​-​m​a​c​h​i​n​e​-​s​a​f​e​t​y​/​s​o​f​t​w​a​r​e​-​s​i​s​t​e​m​a​/​i​n​d​e​x​.​jsp. [Accessed: 30- Jan- 2017].

[17]      “fail­ure mode”, 192−03−17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331 – 338, 2006.

[19]     Out of Control — Why con­trol sys­tems go wrong and how to pre­vent fail­ure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

[22]     “Field-​programmable gate array”, En​.wiki​pe​dia​.org, 2017. [Online]. Available: https://​en​.wiki​pe​dia​.org/​w​i​k​i​/​F​i​e​l​d​-​p​r​o​g​r​a​m​m​a​b​l​e​_​g​a​t​e​_​a​r​ray. [Accessed: 16-​Jun-​2017].

## ISO 13849 – 1 Analysis — Part 6: CCF — Common Cause Failures

This entry is part 6 of 7 in the series How to do a 13849 – 1 ana­lys­is

# What is a Common Cause Failure?

There are two similar-​sounding terms that people often get con­fused: Common Cause Failure (CCF) and Common Mode Failure. While these two types of fail­ures sound sim­il­ar, they are dif­fer­ent. A Common Cause Failure is a fail­ure in a sys­tem where two or more por­tions of the sys­tem fail at the same time from a single com­mon cause. An example could be a light­ning strike that causes a con­tact­or to weld and sim­ul­tan­eously takes out the safety relay pro­cessor that con­trols the con­tact­or. Common cause fail­ures are there­fore two dif­fer­ent man­ners of fail­ure in two dif­fer­ent com­pon­ents, but with a single cause.

Common Mode Failure is where two com­pon­ents or por­tions of a sys­tem fail in the same way, at the same time. For example, two inter­pos­ing relays both fail with wel­ded con­tacts at the same time. The fail­ures could be caused by the same cause or from dif­fer­ent causes, but the way the com­pon­ents fail is the same.

Common-​cause fail­ure includes com­mon mode fail­ure, since a com­mon cause can res­ult in a com­mon man­ner of fail­ure in identic­al devices used in a sys­tem.

Here are the form­al defin­i­tions of these terms:

3.1.6 com­mon cause fail­ure CCF

fail­ures of dif­fer­ent items, res­ult­ing from a single event, where these fail­ures are not con­sequences of each oth­er

Note 1 to entry: Common cause fail­ures should not be con­fused with com­mon mode fail­ures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04 – 23.] [1]

3.36 com­mon mode fail­ures

fail­ures of items char­ac­ter­ized by the same fault mode

NOTE Common mode fail­ures should not be con­fused with com­mon cause fail­ures, as the com­mon mode fail­ures can res­ult from dif­fer­ent causes. [lEV 191 – 04-​24] [3]

The “com­mon mode” fail­ure defin­i­tion uses the phrase “fault mode”, so let’s look at that as well:

fail­ure mode
DEPRECATED: fault mode
man­ner in which fail­ure occurs

Note 1 to entry: A fail­ure mode may be defined by the func­tion lost or oth­er state trans­ition that occurred. [IEV 192 – 03-​17] [17]

As you can see, “fault mode” is no longer used, in favour of the more com­mon “fail­ure mode”, so it is pos­sible to re-​write the common-​mode fail­ure defin­i­tion to read, “fail­ures of items char­ac­ter­ised by the same man­ner of fail­ure.”

# Random, Systematic and Common Cause Failures

Why do we need to care about this? There are three man­ners in which fail­ures occur: ran­dom fail­ures, sys­tem­at­ic fail­ures, and com­mon cause fail­ures. When devel­op­ing safety related con­trols, we need to con­sider all three and mit­ig­ate them as much as pos­sible.

Random fail­ures do not fol­low any pat­tern, occur­ring ran­domly over time, and are often brought on by over-​stressing the com­pon­ent, or from man­u­fac­tur­ing flaws. Random fail­ures can increase due to envir­on­ment­al or process-​related stresses, like cor­ro­sion, EMI, nor­mal wear-​and-​tear, or oth­er over-​stressing of the com­pon­ent or sub­sys­tem. Random fail­ures are often mit­ig­ated through selec­tion of high-​reliability com­pon­ents [18].

Systematic fail­ures include common-​cause fail­ures, and occur because some human beha­viour occurred that was not caught by pro­ced­ur­al means. These fail­ures are due to design, spe­cific­a­tion, oper­at­ing, main­ten­ance, and install­a­tion errors. When we look at sys­tem­at­ic errors, we are look­ing for things like train­ing of the sys­tem design­ers, or qual­ity assur­ance pro­ced­ures used to val­id­ate the way the sys­tem oper­ates. Systematic fail­ures are non-​random and com­plex, mak­ing them dif­fi­cult to ana­lyse stat­ist­ic­ally. Systematic errors are a sig­ni­fic­ant source of common-​cause fail­ures because they can affect redund­ant devices, and because they are often determ­in­ist­ic, occur­ring whenev­er a set of cir­cum­stances exist.

Systematic fail­ures include many types of errors, such as:

• Manufacturing defects, e.g., soft­ware and hard­ware errors built into the device by the man­u­fac­turer.
• Specification mis­takes, e.g. incor­rect design basis and inac­cur­ate soft­ware spe­cific­a­tion.
• Implementation errors, e.g., improp­er install­a­tion, incor­rect pro­gram­ming, inter­face prob­lems, and not fol­low­ing the safety manu­al for the devices used to real­ise the safety func­tion.
• Operation and main­ten­ance, e.g., poor inspec­tion, incom­plete test­ing and improp­er bypassing [18].

Diverse redund­ancy is com­monly used to mit­ig­ate sys­tem­at­ic fail­ures, since dif­fer­ences in com­pon­ent or sub­sys­tem design tend to cre­ate non-​overlapping sys­tem­at­ic fail­ures, redu­cing the like­li­hood of a com­mon error cre­at­ing a common-​mode fail­ure. Errors in spe­cific­a­tion, imple­ment­a­tion, oper­a­tion and main­ten­ance are not affected by diversity.

Fig 1 below shows the res­ults of a small study done by the UK’s Health and Safety Executive in 1994 [19] that sup­ports the idea that sys­tem­at­ic fail­ures are a sig­ni­fic­ant con­trib­ut­or to safety sys­tem fail­ures. The study included only 34 sys­tems (n=34), so the res­ults can­not be con­sidered con­clus­ive. However, there were some start­ling res­ults. As you can see, errors in the spe­cific­a­tion of the safety func­tions (Safety Requirement Specification) res­ul­ted in about 44% of the sys­tem fail­ures in the study. Based on this small sample, sys­tem­at­ic fail­ures appear to be a sig­ni­fic­ate source of fail­ures.

# Handling CCF in ISO 13849 – 1

Now that we under­stand WHAT Common-​Cause Failure is, and WHY it’s import­ant, we can talk about HOW it is handled in ISO 13849 – 1. Since ISO 13849 – 1 is inten­ded to be a sim­pli­fied func­tion­al safety stand­ard, CCF ana­lys­is is lim­ited to a check­list in Annex F, Table F.1. Note that Annex F is inform­at­ive, mean­ing that it is guid­ance mater­i­al to help you apply the stand­ard. Since this is the case, you could use any oth­er means suit­able for assess­ing CCF mit­ig­a­tion, like those in IEC 61508, or in oth­er stand­ards.

Table F.1 is set up with a series of mit­ig­a­tion meas­ures which are grouped togeth­er in related cat­egor­ies. Each group is provided with a score that can be claimed if you have imple­men­ted the mit­ig­a­tions in that group. ALL OF THE MEASURES in each group must be ful­filled in order to claim the points for that cat­egory. Here’s an example:

In order to claim the 20 points avail­able for the use of sep­ar­a­tion or segreg­a­tion in the sys­tem design, there must be a sep­ar­a­tion between the sig­nal paths. Several examples of this are giv­en for clar­ity.

Table F.1 lists six groups of mit­ig­a­tion meas­ures. In order to claim adequate CCF mit­ig­a­tion, a min­im­um score of 65 points must be achieved. Only Category 2, 3 and 4 archi­tec­tures are required to meet the CCF require­ments in order to claim the PL, but without meet­ing the CCF require­ment you can­not claim the PL, regard­less of wheth­er the design meets the oth­er cri­ter­ia or not.

One final note on CCF: If you are try­ing to review an exist­ing con­trol sys­tem, say in an exist­ing machine, or in a machine designed by a third party where you have no way to determ­ine the exper­i­ence and train­ing of the design­ers or the cap­ab­il­ity of the company’s change man­age­ment pro­cess, then you can­not adequately assess CCF [8]. This fact is recog­nised in CSA Z432-​16 [20], chapter 8. [20] allows the review­er to simply veri­fy that the archi­tec­tur­al require­ments, exclus­ive of any prob­ab­il­ist­ic require­ments, have been met. This is par­tic­u­larly use­ful for engin­eers review­ing machinery under Ontario’s Pre-​Start Health and Safety require­ments [21], who are fre­quently work­ing with less-​than-​complete design doc­u­ment­a­tion.

In case you missed the first part of the series, you can read it here. In the next art­icle in this series, I’m going to review the pro­cess flow for sys­tem ana­lys­is as cur­rently out­lined in ISO 13849 – 1. Watch for it!

# Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. The com­plete ref­er­ence list is included in the last post of the series.

[17]      “fail­ure mode”, 192−03−17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331 – 338, 2006.

[19]     Out of Control — Why con­trol sys­tems go wrong and how to pre­vent fail­ure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

## ISO 13849 – 1 Analysis — Part 5: Diagnostic Coverage (DC)

This entry is part 5 of 7 in the series How to do a 13849 – 1 ana­lys­is

# What is Diagnostic Coverage?

Understanding Diagnostic Coverage (DC) as it is used in ISO 13849 – 1 [1] is crit­ic­al to ana­lys­ing the design of any safety func­tion assessed using this stand­ard. In case you missed a pre­vi­ous part of the series, you can read it here.

In the last instal­ment of this series dis­cuss­ing MTTFD, I brought up the fact that everything fails even­tu­ally, and so everything has a nat­ur­al fail­ure rate. The bathtub curve shown at the top of this post shows a typ­ic­al fail­ure rate curve for most products. Failure rates tell you the aver­age time (or some­times the mean time) it takes for com­pon­ents or sys­tems to fail. Failure rates are expressed in many ways, MTTFD and PFHd being the ways rel­ev­ant to this dis­cus­sion of ISO 13849 ana­lys­is. MTTFis giv­en in years, and PFHd is giv­en in frac­tion­al hours (1/​h). As a remind­er, PFHd stands for “Probability of dan­ger­ous Failure per Hour”.

Three of the stand­ard archi­tec­tures include auto­mat­ic dia­gnost­ic func­tions, Categories 2, 3 and 4. As soon as we add dia­gnostics to the sys­tem, we need to know what faults the dia­gnostics can detect and how many of the dan­ger­ous fail­ures rel­at­ive to the total num­ber of fail­ures that rep­res­ents. Diagnostic Coverage (DC) rep­res­ents the ratio of dan­ger­ous fail­ures that can be detec­ted to the total dan­ger­ous fail­ures that could occur, expressed as a per­cent­age. There will be some fail­ures that do not res­ult in a dan­ger­ous fail­ure, and those fail­ures are excluded from DC because we don’t need to worry about them – if they occur, the sys­tem will not fail into a dan­ger­ous state.

Here’s the form­al defin­i­tion from [1]:

3.1.26 dia­gnost­ic cov­er­age (DC)

meas­ure of the effect­ive­ness of dia­gnostics, which may be determ­ined as the ratio between the fail­ure rate of detec­ted dan­ger­ous fail­ures and the fail­ure rate of total dan­ger­ous fail­ures

Note 1 to entry: Diagnostic cov­er­age can exist for the whole or parts of a safety-​related sys­tem. For example, dia­gnost­ic cov­er­age could exist for sensors and/​or logic sys­tem and/​or final ele­ments. [SOURCE: IEC 61508 – 4:1998, 3.8.6, mod­i­fied.]

That brings up two oth­er related defin­i­tions that need to be kept in mind [1]:

3.1.4 fail­ure

ter­min­a­tion of the abil­ity of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Failure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Failures which only affect the avail­ab­il­ity of the pro­cess under con­trol are out­side of the scope of this part of ISO 13849. [SOURCE: IEC 60050 – 191:1990, 04 – 01.]

and the most import­ant one [1]:

3.1.5 dan­ger­ous fail­ure

fail­ure which has the poten­tial to put the SRP/​CS in a haz­ard­ous or fail-​to-​function state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem; in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-​to- func­tion state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, mod­i­fied.]

Just as a remind­er, SRP/​CS stands for “safety-​related parts of con­trol sys­tems”.

## Failure Math

### Failure Rate Data Sources

To do any cal­cu­la­tions, we need data, and this is true for fail­ure rates as well. ISO 13849 – 1 provides some tables in the annexes that list some com­mon types of com­pon­ents and their asso­ci­ated fail­ure rates, and there are more fail­ure rate tables in ISO 13849 – 2. A word of cau­tion here: Do not mix sources of fail­ure rate data, as the con­di­tions under which that data is true won’t match the data in ISO 13849. There are a few good sources of fail­ure rate data out there, for example, MIL-​HDBK-​217, Reliability Prediction of Electronic Equipment [15], as well as the data­base main­tained by Exida. In any case, use a single source for your fail­ure rate data.

### Failure Rate Variables

IEC 61508 [7] defines a num­ber of vari­ables related to fail­ure rates. The lower­case Greek let­ter lambda, $\lambda$, is used to denote fail­ures.

The com­mon vari­able des­ig­na­tions used are:

$\lambda$ = fail­ures
$\lambda_{(t)}$= fail­ure rate
$\lambda_s$ = “safe” fail­ures
$\lambda_d$ = “dan­ger­ous” fail­ures
$\lambda_{dd}$ = detect­able “dan­ger­ous” fail­ures
$\lambda_{du}$ = undetect­able “dan­ger­ous” fail­ures

### Calculating DC

Of these vari­ables, we only need to con­cern ourselves with $\lambda_d$, $\lambda_{dd}$ and $\lambda_{du}$. To under­stand how these vari­ables are used, we can express their rela­tion­ship as

$\lambda_d=\lambda_{dd}+\lambda_{du}$

Following on that idea, the Diagnostic Coverage can be expressed as a per­cent­age like this:

$DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100$

## Determining DC%

If you want to actu­ally cal­cu­late DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hard­core types to IEC 61508 – 2, Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems – Part 2: Requirements for electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems. This stand­ard goes into some depth on how to determ­ine fail­ure rates and how to cal­cu­late the “Safe Failure Fraction,” a num­ber which is related to DC but is not the same.

For every­one else, the good news is that you can use the table in Annex E to estim­ate the DC%. It’s worth not­ing here that Annex E is “Informative.” In standards-​speak, this means that the inform­a­tion in the annex is not part of the “norm­at­ive” text, which means that it is simply inform­a­tion to help you use the norm­at­ive part of the stand­ard. The design must con­form to the require­ments in the norm­at­ive text if you want to claim con­form­ity to the stand­ard. The fact that [1, Annex E] is inform­at­ive gives you the option to cal­cu­late the DC% value rather than select­ing it from Table E.1. Using the cal­cu­lated value would not viol­ate the require­ments in the norm­at­ive text.

If you are using IFA SISTEMA [16] to do the cal­cu­la­tions for you, you will find that the soft­ware lim­its you to select­ing a single DC meas­ure from Table E.1, and this prin­ciple applies if you are doing the cal­cu­la­tions by hand too. Only one item from Table E.1 can be selec­ted for a giv­en safety func­tion.

## Ranking DC

Once you have determ­ined the DC for a safety func­tion, you need to com­pare the DC value against [1, Table 5] to see if the DC is suf­fi­cient for the PLr you are try­ing to achieve. Table 5 bins the DC res­ults into four ranges. Just like bin­ning the PFHd val­ues into five ranges helps to pre­vent pre­ci­sion bias in estim­at­ing the prob­ab­il­ity of fail­ure of the com­plete sys­tem or safety func­tion, the ranges in Table 5 helps to pre­vent pre­ci­sion bias in the cal­cu­lated or selec­ted DC val­ues.

If the DC value was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add addi­tion­al dia­gnost­ic fea­tures so that you can either select a high­er cov­er­age from [1, Table E.1] or cal­cu­late a high­er value using [14].

## Multiple safety functions

When you have mul­tiple safety func­tions that make up a com­plete safety sys­tem, for example, an emer­gency stop func­tion and a guard inter­lock­ing func­tion, the DC val­ues need to be aver­aged to determ­ine the over­all DC for the com­plete sys­tem. [1, Annex E] provides you with a meth­od to do this in Equation E.1.

Plug in the val­ues for MTTFD and DC for each safety func­tion, and cal­cu­late the res­ult­ing DCavg value for the com­plete sys­tem.

That’s it for this art­icle. The next part will cov­er Common Cause Failures (CCF). Look for it on 20-​Mar-​17!

In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[16]     “IFA – Practical aids: Software-​Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv​.de, 2017. [Online]. Available: http://​www​.dguv​.de/​i​f​a​/​p​r​a​x​i​s​h​i​l​f​e​n​/​p​r​a​c​t​i​c​a​l​-​s​o​l​u​t​i​o​n​s​-​m​a​c​h​i​n​e​-​s​a​f​e​t​y​/​s​o​f​t​w​a​r​e​-​s​i​s​t​e​m​a​/​i​n​d​e​x​.​jsp. [Accessed: 30- Jan- 2017].