ISO 13849–1 Analysis — Part 7: Safety-Related Software

This entry is part 7 of 9 in the series How to do a 13849–1 analy­sis

Safety-Related Software

Up to this point, I have been dis­cussing the basic process­es used for the design of safe­ty-relat­ed parts of con­trol sys­tems. The under­ly­ing assump­tion is that these tech­niques apply to the design of hard­ware used for safe­ty pur­pos­es. The remain­ing ques­tion focus­es on the design and devel­op­ment of safe­ty-relat­ed soft­ware that runs on that hard­ware. If you have not read the rest of this series and would like to catch up first, you can find it here.

In this dis­cus­sion of safe­ty-relat­ed soft­ware, keep in mind that I am talk­ing about soft­ware that is only intend­ed to reduce risk. Some plat­forms that are not well suit­ed for safe­ty soft­ware, pri­mar­i­ly com­mon off-the-shelf (COTS) oper­at­ing sys­tems like Win­dows, MacOS and Lin­ux. Gen­er­al­ly speak­ing, these oper­at­ing sys­tems are too com­plex and sub­ject to unan­tic­i­pat­ed changes to be suit­able for high-reli­a­bil­i­ty appli­ca­tions. There is noth­ing wrong with using these sys­tems for annun­ci­a­tion and mon­i­tor­ing func­tions, but the safe­ty func­tions should run on more pre­dictable plat­forms.

The method­ol­o­gy dis­cussed in ISO 13849–1 is usable up to PLd. At the end of the Scope we find Note 4:

NOTE 4 For safe­ty-relat­ed embed­ded soft­ware for com­po­nents with PLr = e, see IEC 61508–3:1998, Clause 7.

As you can see, for very high-reli­a­bil­i­ty sys­tems, i.e., PLe/SIL3 or SIL4, it is nec­es­sary to move to IEC 61508. The meth­ods dis­cussed here are based on ISO 13849–1:2015, Chap­ter 4.6.

Goals

There are two goals for safe­ty-relat­ed soft­ware devel­op­ment activ­i­ties:

  1. Avoid faults
  2. Gen­er­ate read­able, under­stand­able, testable and main­tain­able soft­ware

Avoiding Faults

Fig. 1 [1, Fig. 6] shows the “V-mod­el” for soft­ware devel­op­ment. This approach to soft­ware design incor­po­rates both val­i­da­tion and ver­i­fi­ca­tion, and when cor­rect­ly imple­ment­ed will result in soft­ware that meets the design spec­i­fi­ca­tions.

If you aren’t sure what the dif­fer­ence is between ver­i­fi­ca­tion and val­i­da­tion, I remem­ber it is this way: Val­i­da­tion means “Are we build­ing the right thing?”, and ver­i­fi­ca­tion means “Did we build the thing right?” The whole process hinges on the Safe­ty Require­ment Spec­i­fi­ca­tion (SRS), so fail­ing to get that part of the process right in the begin­ning will neg­a­tive­ly impact both hard­ware and soft­ware design. The SRS is the yard­stick used to decide if you built the right thing. With­out that, you are clue­less about what you are build­ing.

Simplified V-model of software safety lifecycle
Fig­ure 1 — Sim­pli­fied V-mod­el of soft­ware safe­ty life­cy­cle

Com­ing in from the Safe­ty Require­ment Spec­i­fi­ca­tion (also called the safe­ty func­tion spec­i­fi­ca­tion), each step in the process is shown. The dashed lines illus­trate the ver­i­fi­ca­tion process at each step. Notice that the actu­al cod­ing step is at the bot­tom of the V-mod­el. Every­thing above the cod­ing stage is either plan­ning and design, or qual­i­ty assur­ance activ­i­ties.

There are oth­er meth­ods that can be used to result in ver­i­fied and val­i­dat­ed soft­ware, so if you have a QA process that pro­duces sol­id results, you may not need to change it. I would rec­om­mend that you review all the stages in the V-mod­el to ensure that your QA sys­tem has sim­i­lar process­es.

To make set­ting up safe­ty sys­tems sim­pler for design­ers and inte­gra­tors, there are two approach­es to soft­ware design that can be used.

Two Approaches to Software Design

There are two approach­es to soft­ware design that should be con­sid­ered:

  • Pre­con­fig­ured (build­ing-block style) soft­ware
  • Ful­ly cus­tomised soft­ware

Preconfigured Building-Block Software

The pre­con­fig­ured build­ing-block approach is typ­i­cal­ly used for con­fig­ur­ing safe­ty PLCs or pro­gram­ma­ble safe­ty relays or mod­ules. This type of soft­ware is referred to as “safe­ty-relat­ed embed­ded soft­ware (SRESW)” in [1].

Pre-writ­ten func­tion blocks are pro­vid­ed by the device man­u­fac­tur­er. Each func­tion block has a par­tic­u­lar role: emer­gency stop, safe­ty gate input, zero-speed detec­tion, and so on. When con­fig­ur­ing a safe­ty PLC or safe­ty mod­ules that use this approach, the design­er selects the appro­pri­ate block and then con­fig­ures the inputs, out­puts, and any oth­er func­tion­al char­ac­ter­is­tics that are need­ed. The design­er has no access to the safe­ty-relat­ed code, so apart from con­fig­u­ra­tion errors, no oth­er errors can be intro­duced. The func­tion blocks are ver­i­fied and val­i­dat­ed (V & V) by the con­trols com­po­nent man­u­fac­tur­er, usu­al­ly with the sup­port of an accred­it­ed cer­ti­fi­ca­tion body. The func­tion blocks will nor­mal­ly have a PL asso­ci­at­ed with them, and a state­ment like “suit­able for PLe” will be made in the func­tion block descrip­tion.

This approach elim­i­nates the need to do a detailed V & V of the code by the design­ing enti­ty (i.e., the machine builder). How­ev­er, the machine builder is still required to do a V & V on the oper­a­tion of the sys­tem as they have con­fig­ured it. The machine V & V includes all the usu­al fault injec­tion tests and func­tion­al tests to ensure that the sys­tem will behave in as intend­ed in the pres­ence of a demand on the safe­ty func­tion or a fault con­di­tion. The faults that should be test­ed are those in your Fault List. If you don’t have a fault list or don’t know what a Fault List is, see Part 8 in this series.

Using pre-con­fig­ured build­ing blocks achieves the first goal, fault avoid­ance, at least as far as the soft­ware cod­ing is con­cerned. The con­fig­u­ra­tion soft­ware will val­i­date the func­tion block con­fig­u­ra­tions before com­pil­ing the soft­ware for upload to the safe­ty con­troller so that most con­fig­u­ra­tion errors will be caught at that stage.

This approach also facil­i­tates the sec­ond goal, as long as the con­fig­u­ra­tion soft­ware is usable and main­tained by the soft­ware ven­dor. The con­fig­u­ra­tion soft­ware usu­al­ly includes the abil­i­ty to anno­tate the con­fig­u­ra­tions with rel­e­vant details to assist with the read­abil­i­ty and under­stand­abil­i­ty of the soft­ware.

Fully Customised Software

This approach is used where a ful­ly cus­tomised hard­ware plat­form is being used, and the safe­ty soft­ware is designed to run on that plat­form. [1] refers to this type of soft­ware as “Safe­ty-relat­ed appli­ca­tion soft­ware (SRASW).” A ful­ly cus­tomised soft­ware appli­ca­tion is used where a very spe­cialised safe­ty sys­tem is con­tem­plat­ed, and FPGAs or oth­er cus­tomised hard­ware is being used. These sys­tems are usu­al­ly pro­grammed using full-vari­abil­i­ty lan­guages.

In this case, the full hard­ware and soft­ware V & V approach must be employed. In my opin­ion, ISO 13849–1 is prob­a­bly not the best choice for this approach due to its sim­pli­fi­ca­tion, and I would usu­al­ly rec­om­mend using IEC 61508–3 as the basis for the design, ver­i­fi­ca­tion, and val­i­da­tion of ful­ly cus­tomised soft­ware.

Process requirements

Safety-Related Embedded Software (SRESW)

[1, 4.6.2] pro­vides a laun­dry list of ele­ments that must be incor­po­rat­ed into the V-mod­el process­es when devel­op­ing SRESW, bro­ken down by PLa through PLd, and then some addi­tion­al require­ments for PLc and PLd.

If you are design­ing SRESW for PLe, [1, 4.6.2] points you direct­ly to IEC 61508–3, clause 7, which cov­ers soft­ware suit­able for SIL3 appli­ca­tions.

Safety-Related Application Software (SRASW)

[1, 4.6.3] pro­vides a list of require­ments that must be met through the v-mod­el process for SRASW, and allows that PLa through PLe can be met by code writ­ten in LVL and that PLe appli­ca­tions can also be designed using FVL. In cas­es where soft­ware is devel­oped using  FVL, the soft­ware can be treat­ed as the embed­ded soft­ware prod­ucts (SRESW) are han­dled.

A sim­i­lar archi­tec­tur­al mod­el to that used for sin­gle-chan­nel hard­ware devel­op­ment is used, as shown in Fig. 2  [1, Fig 7].

General architecture model of software
Fig­ure 2 — Gen­er­al archi­tec­ture mod­el of soft­ware

The com­plete V-mod­el must be applied to safe­ty-relat­ed appli­ca­tion soft­ware, with all of the addi­tion­al require­ments from [1, 4.6.3] includ­ed in the process mod­el.

Conclusions

There is a lot to safe­ty-relat­ed soft­ware devel­op­ment, cer­tain­ly much more than could be dis­cussed in a blog post like this or even in a stan­dard like ISO 13849–1. If you are con­tem­plat­ing devel­op­ing safe­ty relat­ed soft­ware and you are not famil­iar with the tech­niques need­ed to devel­op this kind of high-reli­a­bil­i­ty soft­ware, I would sug­gest you get help from a qual­i­fied devel­op­er. Keep in mind that there can be sig­nif­i­cant lia­bil­i­ty attached to safe­ty sys­tem fail­ures, includ­ing the deaths of peo­ple using your prod­uct. If you are devel­op­ing SRASW, I would also rec­om­mend fol­low­ing IEC 61508–3 as the basis for the devel­op­ment and relat­ed QA process­es.

 Definitions

3.1.36 appli­ca­tion soft­ware
soft­ware spe­cif­ic to the appli­ca­tion, imple­ment­ed by the machine man­u­fac­tur­er, and gen­er­al­ly con­tain­ing log­ic sequences, lim­its and expres­sions that con­trol the appro­pri­ate inputs, out­puts, cal­cu­la­tions and deci­sions nec­es­sary to meet the SRP/CS require­ments 3.1.37 embed­ded soft­ware firmware sys­tem soft­ware soft­ware that is part of the sys­tem sup­plied by the con­trol man­u­fac­tur­er and which is not acces­si­ble for mod­i­fi­ca­tion by the user of the machin­ery Note 1 to entry: Embed­ded soft­ware is usu­al­ly writ­ten in FVL.
Note 1 to entry: Embed­ded soft­ware is usu­al­ly writ­ten in FVL.
3.1.34 lim­it­ed vari­abil­i­ty lan­guage LVL
type of lan­guage that pro­vides the capa­bil­i­ty of com­bin­ing pre­de­fined, appli­ca­tion-spe­cif­ic library func­tions to imple­ment the safe­ty require­ments spec­i­fi­ca­tions
Note 1 to entry: Typ­i­cal exam­ples of LVL (lad­der log­ic, func­tion block dia­gram) are giv­en in IEC 61131–3.
Note 2 to entry: A typ­i­cal exam­ple of a sys­tem using LVL: PLC. [SOURCE: IEC 61511–1:2003, 3.2.80.1.2, mod­i­fied.]
3.1.35 full vari­abil­i­ty lan­guage FVL
type of lan­guage that pro­vides the capa­bil­i­ty of imple­ment­ing a wide vari­ety of func­tions and appli­ca­tions EXAMPLE C, C++, Assem­bler.
Note 1 to entry: A typ­i­cal exam­ple of sys­tems using FVL: embed­ded sys­tems.
Note 2 to entry: In the field of machin­ery, FVL is found in embed­ded soft­ware and rarely in appli­ca­tion soft­ware. [SOURCE: IEC 61511–1:2003, 3.2.80.1.3, mod­i­fied.]
3.1.37 embed­ded soft­ware
firmware
sys­tem soft­ware
soft­ware that is part of the sys­tem sup­plied by the con­trol man­u­fac­tur­er and which is not acces­si­ble for mod­i­fi­ca­tion by the user of the machin­ery.
Note 1 to entry: Embed­ded soft­ware is usu­al­ly writ­ten in FVL.
Field Pro­gram­ma­ble Gate Array FPGA
A field-pro­gram­ma­ble gate array (FPGA) is an inte­grat­ed cir­cuit designed to be con­fig­ured by a cus­tomer or a design­er after man­u­fac­tur­ing – hence “field-pro­gram­ma­ble”. The FPGA con­fig­u­ra­tion is gen­er­al­ly spec­i­fied using a hard­ware descrip­tion lan­guage (HDL), sim­i­lar to that used for an appli­ca­tion-spe­cif­ic inte­grat­ed cir­cuit (ASIC). [22]

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[2]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 2: Val­i­da­tion. 2nd Edi­tion. ISO Stan­dard 13849–2. 2012.

[3]      Safe­ty of machin­ery — Gen­er­al prin­ci­ples for design — Risk assess­ment and risk reduc­tion. ISO Stan­dard 12100. 2010.

[4]     Safe­guard­ing of Machin­ery. 2nd Edi­tion. CSA Stan­dard Z432. 2004.

[5]     Risk Assess­ment and Risk Reduc­tion- A Guide­line to Esti­mate, Eval­u­ate and Reduce Risks Asso­ci­at­ed with Machine Tools. ANSI Tech­ni­cal Report B11.TR3. 2000.

[6]    Safe­ty of machin­ery — Emer­gency stop func­tion — Prin­ci­ples for design. ISO Stan­dard 13850. 2015.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. 7 parts. IEC Stan­dard 61508. Edi­tion 2. 2010.

[8]     S. Joce­lyn, J. Bau­doin, Y. Chin­ni­ah, and P. Char­p­en­tier, “Fea­si­bil­i­ty study and uncer­tain­ties in the val­i­da­tion of an exist­ing safe­ty-relat­ed con­trol cir­cuit with the ISO 13849–1:2006 design stan­dard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.

[9]    Guid­ance on the appli­ca­tion of ISO 13849–1 and IEC 62061 in the design of safe­ty-relat­ed con­trol sys­tems for machin­ery. IEC Tech­ni­cal Report TR 62061–1. 2010.

[10]     Safe­ty of machin­ery — Func­tion­al safe­ty of safe­ty-relat­ed elec­tri­cal, elec­tron­ic and pro­gram­ma­ble elec­tron­ic con­trol sys­tems. IEC Stan­dard 62061. 2005.

[11]    Guid­ance on the appli­ca­tion of ISO 13849–1 and IEC 62061 in the design of safe­ty-relat­ed con­trol sys­tems for machin­ery. IEC Tech­ni­cal Report 62061–1. 2010.

[12]    D. S. G. Nix, Y. Chin­ni­ah, F. Dosio, M. Fessler, F. Eng, and F. Schr­ev­er, “Link­ing Risk and Reliability—Mapping the out­put of risk assess­ment tools to func­tion­al safe­ty require­ments for safe­ty relat­ed con­trol sys­tems,” 2015.

[13]    Safe­ty of machin­ery. Safe­ty relat­ed parts of con­trol sys­tems. Gen­er­al prin­ci­ples for design. CEN Stan­dard EN 954–1. 1996.

[14]   Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems — Part 2: Require­ments for electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. IEC Stan­dard 61508–2. 2010.

[15]     Reli­a­bil­i­ty Pre­dic­tion of Elec­tron­ic Equip­ment. Mil­i­tary Hand­book MIL-HDBK-217F. 1991.

[16]     “IFA — Prac­ti­cal aids: Soft­ware-Assis­tent SISTEMA: Safe­ty Integri­ty — Soft­ware Tool for the Eval­u­a­tion of Machine Appli­ca­tions”, Dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].

[17]      “fail­ure mode”, 192–03-17, Inter­na­tion­al Elec­trotech­ni­cal Vocab­u­lary. IEC Inter­na­tion­al Elec­trotech­ni­cal Com­mis­sion, Gene­va, 2015.

[18]      M. Gen­tile and A. E. Sum­mers, “Com­mon Cause Fail­ure: How Do You Man­age Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331–338, 2006.

[19]     Out of Control—Why con­trol sys­tems go wrong and how to pre­vent fail­ure, 2nd ed. Rich­mond, Sur­rey, UK: HSE Health and Safe­ty Exec­u­tive, 2003.

[20]     Safe­guard­ing of Machin­ery. 3rd Edi­tion. CSA Stan­dard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Cana­da, 1990.

[22]     “Field-pro­gram­ma­ble gate array”, En.wikipedia.org, 2017. [Online]. Avail­able: https://en.wikipedia.org/wiki/Field-programmable_gate_array. [Accessed: 16-Jun-2017].

ISO 13849–1 Analysis — Part 6: CCF — Common Cause Failures

This entry is part 6 of 9 in the series How to do a 13849–1 analy­sis

What is a Common Cause Failure?

There are two sim­i­lar-sound­ing terms that peo­ple often get con­fused: Com­mon Cause Fail­ure (CCF) and Com­mon Mode Fail­ure. While these two types of fail­ures sound sim­i­lar, they are dif­fer­ent. A Com­mon Cause Fail­ure is a fail­ure in a sys­tem where two or more por­tions of the sys­tem fail at the same time from a sin­gle com­mon cause. An exam­ple could be a light­ning strike that caus­es a con­tac­tor to weld and simul­ta­ne­ous­ly takes out the safe­ty relay proces­sor that con­trols the con­tac­tor. Com­mon cause fail­ures are there­fore two dif­fer­ent man­ners of fail­ure in two dif­fer­ent com­po­nents, but with a sin­gle cause.

Com­mon Mode Fail­ure is where two com­po­nents or por­tions of a sys­tem fail in the same way, at the same time. For exam­ple, two inter­pos­ing relays both fail with weld­ed con­tacts at the same time. The fail­ures could be caused by the same cause or from dif­fer­ent caus­es, but the way the com­po­nents fail is the same.

Com­mon-cause fail­ure includes com­mon mode fail­ure, since a com­mon cause can result in a com­mon man­ner of fail­ure in iden­ti­cal devices used in a sys­tem.

Here are the for­mal def­i­n­i­tions of these terms:

3.1.6 com­mon cause fail­ure CCF

fail­ures of dif­fer­ent items, result­ing from a sin­gle event, where these fail­ures are not con­se­quences of each oth­er

Note 1 to entry: Com­mon cause fail­ures should not be con­fused with com­mon mode fail­ures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04–23.] [1]

 

3.36 com­mon mode fail­ures

fail­ures of items char­ac­ter­ized by the same fault mode

NOTE Com­mon mode fail­ures should not be con­fused with com­mon cause fail­ures, as the com­mon mode fail­ures can result from dif­fer­ent caus­es. [lEV 191–04-24] [3]

The “com­mon mode” fail­ure def­i­n­i­tion uses the phrase “fault mode”, so let’s look at that as well:

fail­ure mode
DEPRECATED: fault mode
man­ner in which fail­ure occurs

Note 1 to entry: A fail­ure mode may be defined by the func­tion lost or oth­er state tran­si­tion that occurred. [IEV 192–03-17] [17]

As you can see, “fault mode” is no longer used, in favour of the more com­mon “fail­ure mode”, so it is pos­si­ble to re-write the com­mon-mode fail­ure def­i­n­i­tion to read, “fail­ures of items char­ac­terised by the same man­ner of fail­ure.”

Random, Systematic and Common Cause Failures

Why do we need to care about this? There are three man­ners in which fail­ures occur: ran­dom fail­ures, sys­tem­at­ic fail­ures, and com­mon cause fail­ures. When devel­op­ing safe­ty relat­ed con­trols, we need to con­sid­er all three and mit­i­gate them as much as pos­si­ble.

Ran­dom fail­ures do not fol­low any pat­tern, occur­ring ran­dom­ly over time, and are often brought on by over-stress­ing the com­po­nent, or from man­u­fac­tur­ing flaws. Ran­dom fail­ures can increase due to envi­ron­men­tal or process-relat­ed stress­es, like cor­ro­sion, EMI, nor­mal wear-and-tear, or oth­er over-stress­ing of the com­po­nent or sub­sys­tem. Ran­dom fail­ures are often mit­i­gat­ed through selec­tion of high-reli­a­bil­i­ty com­po­nents [18].

Sys­tem­at­ic fail­ures include com­mon-cause fail­ures, and occur because some human behav­iour occurred that was not caught by pro­ce­dur­al means. These fail­ures are due to design, spec­i­fi­ca­tion, oper­at­ing, main­te­nance, and instal­la­tion errors. When we look at sys­tem­at­ic errors, we are look­ing for things like train­ing of the sys­tem design­ers, or qual­i­ty assur­ance pro­ce­dures used to val­i­date the way the sys­tem oper­ates. Sys­tem­at­ic fail­ures are non-ran­dom and com­plex, mak­ing them dif­fi­cult to analyse sta­tis­ti­cal­ly. Sys­tem­at­ic errors are a sig­nif­i­cant source of com­mon-cause fail­ures because they can affect redun­dant devices, and because they are often deter­min­is­tic, occur­ring when­ev­er a set of cir­cum­stances exist.

Sys­tem­at­ic fail­ures include many types of errors, such as:

  • Man­u­fac­tur­ing defects, e.g., soft­ware and hard­ware errors built into the device by the man­u­fac­tur­er.
  • Spec­i­fi­ca­tion mis­takes, e.g. incor­rect design basis and inac­cu­rate soft­ware spec­i­fi­ca­tion.
  • Imple­men­ta­tion errors, e.g., improp­er instal­la­tion, incor­rect pro­gram­ming, inter­face prob­lems, and not fol­low­ing the safe­ty man­u­al for the devices used to realise the safe­ty func­tion.
  • Oper­a­tion and main­te­nance, e.g., poor inspec­tion, incom­plete test­ing and improp­er bypass­ing [18].

Diverse redun­dan­cy is com­mon­ly used to mit­i­gate sys­tem­at­ic fail­ures, since dif­fer­ences in com­po­nent or sub­sys­tem design tend to cre­ate non-over­lap­ping sys­tem­at­ic fail­ures, reduc­ing the like­li­hood of a com­mon error cre­at­ing a com­mon-mode fail­ure. Errors in spec­i­fi­ca­tion, imple­men­ta­tion, oper­a­tion and main­te­nance are not affect­ed by diver­si­ty.

Fig 1 below shows the results of a small study done by the UK’s Health and Safe­ty Exec­u­tive in 1994 [19] that sup­ports the idea that sys­tem­at­ic fail­ures are a sig­nif­i­cant con­trib­u­tor to safe­ty sys­tem fail­ures. The study includ­ed only 34 sys­tems (n=34), so the results can­not be con­sid­ered con­clu­sive. How­ev­er, there were some star­tling results. As you can see, errors in the spec­i­fi­ca­tion of the safe­ty func­tions (Safe­ty Require­ment Spec­i­fi­ca­tion) result­ed in about 44% of the sys­tem fail­ures in the study. Based on this small sam­ple, sys­tem­at­ic fail­ures appear to be a sig­ni­fi­cate source of fail­ures.

Pie chart illustrating the proportion of failures in each phase of the life cycle of a machine, based on data taken from HSE Report HSG238.
Fig­ure 1 — HSG 238 Pri­ma­ry Caus­es of Fail­ure by Life Cycle Stage

Handling CCF in ISO 13849–1

Now that we under­stand WHAT Com­mon-Cause Fail­ure is, and WHY it’s impor­tant, we can talk about HOW it is han­dled in ISO 13849–1. Since ISO 13849–1 is intend­ed to be a sim­pli­fied func­tion­al safe­ty stan­dard, CCF analy­sis is lim­it­ed to a check­list in Annex F, Table F.1. Note that Annex F is infor­ma­tive, mean­ing that it is guid­ance mate­r­i­al to help you apply the stan­dard. Since this is the case, you could use any oth­er means suit­able for assess­ing CCF mit­i­ga­tion, like those in IEC 61508, or in oth­er stan­dards.

Table F.1 is set up with a series of mit­i­ga­tion mea­sures which are grouped togeth­er in relat­ed cat­e­gories. Each group is pro­vid­ed with a score that can be claimed if you have imple­ment­ed the mit­i­ga­tions in that group. ALL OF THE MEASURES in each group must be ful­filled in order to claim the points for that cat­e­go­ry. Here’s an exam­ple:

A portion of ISO 13849-1 Table F.1.
ISO 13849–1:2015, Table F.1 Excerpt

In order to claim the 20 points avail­able for the use of sep­a­ra­tion or seg­re­ga­tion in the sys­tem design, there must be a sep­a­ra­tion between the sig­nal paths. Sev­er­al exam­ples of this are giv­en for clar­i­ty.

Table F.1 lists six groups of mit­i­ga­tion mea­sures. In order to claim ade­quate CCF mit­i­ga­tion, a min­i­mum score of 65 points must be achieved. Only Cat­e­go­ry 2, 3 and 4 archi­tec­tures are required to meet the CCF require­ments in order to claim the PL, but with­out meet­ing the CCF require­ment you can­not claim the PL, regard­less of whether the design meets the oth­er cri­te­ria or not.

One final note on CCF: If you are try­ing to review an exist­ing con­trol sys­tem, say in an exist­ing machine, or in a machine designed by a third par­ty where you have no way to deter­mine the expe­ri­ence and train­ing of the design­ers or the capa­bil­i­ty of the company’s change man­age­ment process, then you can­not ade­quate­ly assess CCF [8]. This fact is recog­nised in CSA Z432-16 [20], chap­ter 8. [20] allows the review­er to sim­ply ver­i­fy that the archi­tec­tur­al require­ments, exclu­sive of any prob­a­bilis­tic require­ments, have been met. This is par­tic­u­lar­ly use­ful for engi­neers review­ing machin­ery under Ontario’s Pre-Start Health and Safe­ty require­ments [21], who are fre­quent­ly work­ing with less-than-com­plete design doc­u­men­ta­tion.

In case you missed the first part of the series, you can read it here. In the next arti­cle in this series, I’m going to review the process flow for sys­tem analy­sis as cur­rent­ly out­lined in ISO 13849–1. Watch for it!

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. The com­plete ref­er­ence list is includ­ed in the last post of the series.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[2]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 2: Val­i­da­tion. 2nd Edi­tion. ISO Stan­dard 13849–2. 2012.

[3]      Safe­ty of machin­ery — Gen­er­al prin­ci­ples for design — Risk assess­ment and risk reduc­tion. ISO Stan­dard 12100. 2010.

[8]     S. Joce­lyn, J. Bau­doin, Y. Chin­ni­ah, and P. Char­p­en­tier, “Fea­si­bil­i­ty study and uncer­tain­ties in the val­i­da­tion of an exist­ing safe­ty-relat­ed con­trol cir­cuit with the ISO 13849–1:2006 design stan­dard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.

[17]      “fail­ure mode”, 192–03-17, Inter­na­tion­al Elec­trotech­ni­cal Vocab­u­lary. IEC Inter­na­tion­al Elec­trotech­ni­cal Com­mis­sion, Gene­va, 2015.

[18]      M. Gen­tile and A. E. Sum­mers, “Com­mon Cause Fail­ure: How Do You Man­age Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331–338, 2006.

[19]     Out of Control—Why con­trol sys­tems go wrong and how to pre­vent fail­ure, 2nd ed. Rich­mond, Sur­rey, UK: HSE Health and Safe­ty Exec­u­tive, 2003.

[20]     Safe­guard­ing of Machin­ery. 3rd Edi­tion. CSA Stan­dard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Cana­da, 1990.

ISO 13849–1 Analysis — Part 5: Diagnostic Coverage (DC)

This entry is part 5 of 9 in the series How to do a 13849–1 analy­sis

What is Diagnostic Coverage?

Under­stand­ing Diag­nos­tic Cov­er­age (DC) as it is used in ISO 13849–1 [1] is crit­i­cal to analysing the design of any safe­ty func­tion assessed using this stan­dard. In case you missed a pre­vi­ous part of the series, you can read it here.

In the last instal­ment of this series dis­cussing MTTFD, I brought up the fact that every­thing fails even­tu­al­ly, and so every­thing has a nat­ur­al fail­ure rate. The bath­tub curve shown at the top of this post shows a typ­i­cal fail­ure rate curve for most prod­ucts. Fail­ure rates tell you the aver­age time (or some­times the mean time) it takes for com­po­nents or sys­tems to fail. Fail­ure rates are expressed in many ways, MTTFD and PFHd being the ways rel­e­vant to this dis­cus­sion of ISO 13849 analy­sis. MTTFis giv­en in years, and PFHd is giv­en in frac­tion­al hours (1/h). As a reminder, PFHd stands for “Prob­a­bil­i­ty of dan­ger­ous Fail­ure per Hour”.

Three of the stan­dard archi­tec­tures include auto­mat­ic diag­nos­tic func­tions, Cat­e­gories 2, 3 and 4. As soon as we add diag­nos­tics to the sys­tem, we need to know what faults the diag­nos­tics can detect and how many of the dan­ger­ous fail­ures rel­a­tive to the total num­ber of fail­ures that rep­re­sents. Diag­nos­tic Cov­er­age (DC) rep­re­sents the ratio of dan­ger­ous fail­ures that can be detect­ed to the total dan­ger­ous fail­ures that could occur, expressed as a per­cent­age. There will be some fail­ures that do not result in a dan­ger­ous fail­ure, and those fail­ures are exclud­ed from DC because we don’t need to wor­ry about them — if they occur, the sys­tem will not fail into a dan­ger­ous state.

Here’s the for­mal def­i­n­i­tion from [1]:

3.1.26 diag­nos­tic cov­er­age (DC)

mea­sure of the effec­tive­ness of diag­nos­tics, which may be deter­mined as the ratio between the fail­ure rate of detect­ed dan­ger­ous fail­ures and the fail­ure rate of total dan­ger­ous fail­ures

Note 1 to entry: Diag­nos­tic cov­er­age can exist for the whole or parts of a safe­ty-relat­ed sys­tem. For exam­ple, diag­nos­tic cov­er­age could exist for sen­sors and/or log­ic sys­tem and/or final ele­ments. [SOURCE: IEC 61508–4:1998, 3.8.6, mod­i­fied.]

That brings up two oth­er relat­ed def­i­n­i­tions that need to be kept in mind [1]:

3.1.4 fail­ure

ter­mi­na­tion of the abil­i­ty of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Fail­ure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The con­cept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Fail­ures which only affect the avail­abil­i­ty of the process under con­trol are out­side of the scope of this part of ISO 13849. [SOURCE: IEC 60050–191:1990, 04–01.]

and the most impor­tant one [1]:

3.1.5 dan­ger­ous fail­ure

fail­ure which has the poten­tial to put the SRP/CS in a haz­ardous or fail-to-func­tion state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem; in redun­dant sys­tems a dan­ger­ous hard­ware fail­ure is less like­ly to lead to the over­all dan­ger­ous or fail-to- func­tion state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, mod­i­fied.]

Just as a reminder, SRP/CS stands for “safe­ty-relat­ed parts of con­trol sys­tems”.

Failure Math

Failure Rate Data Sources

To do any cal­cu­la­tions, we need data, and this is true for fail­ure rates as well. ISO 13849–1 pro­vides some tables in the annex­es that list some com­mon types of com­po­nents and their asso­ci­at­ed fail­ure rates, and there are more fail­ure rate tables in ISO 13849–2. A word of cau­tion here: Do not mix sources of fail­ure rate data, as the con­di­tions under which that data is true won’t match the data in ISO 13849. There are a few good sources of fail­ure rate data out there, for exam­ple, MIL-HDBK-217, Reli­a­bil­i­ty Pre­dic­tion of Elec­tron­ic Equip­ment [15], as well as the data­base main­tained by Exi­da. In any case, use a sin­gle source for your fail­ure rate data.

Failure Rate Variables

IEC 61508 [7] defines a num­ber of vari­ables relat­ed to fail­ure rates. The low­er­case Greek let­ter lamb­da, \lambda, is used to denote fail­ures.

The com­mon vari­able des­ig­na­tions used are:

\lambda = fail­ures
\lambda_{(t)} = fail­ure rate
\lambda_s = “safe” fail­ures
\lambda_d = “dan­ger­ous” fail­ures
\lambda_{dd} = detectable “dan­ger­ous” fail­ures
\lambda_{du} = unde­tectable “dan­ger­ous” fail­ures

Calculating DC

Of these vari­ables, we only need to con­cern our­selves with \lambda_d, \lambda_{dd} and \lambda_{du}. To under­stand how these vari­ables are used, we can express their rela­tion­ship as

\lambda_d=\lambda_{dd}+\lambda_{du}

Fol­low­ing on that idea, the Diag­nos­tic Cov­er­age can be expressed as a per­cent­age like this:

DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100

Determining DC%

If you want to actu­al­ly cal­cu­late DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hard­core types to IEC 61508–2, Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems — Part 2: Require­ments for electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. This stan­dard goes into some depth on how to deter­mine fail­ure rates and how to cal­cu­late the “Safe Fail­ure Frac­tion,” a num­ber which is relat­ed to DC but is not the same.

For every­one else, the good news is that you can use the table in Annex E to esti­mate the DC%. It’s worth not­ing here that Annex E is “Infor­ma­tive.” In stan­dards-speak, this means that the infor­ma­tion in the annex is not part of the “nor­ma­tive” text, which means that it is sim­ply infor­ma­tion to help you use the nor­ma­tive part of the stan­dard. The design must con­form to the require­ments in the nor­ma­tive text if you want to claim con­for­mi­ty to the stan­dard. The fact that [1, Annex E] is infor­ma­tive gives you the option to cal­cu­late the DC% val­ue rather than select­ing it from Table E.1. Using the cal­cu­lat­ed val­ue would not vio­late the require­ments in the nor­ma­tive text.

If you are using IFA SISTEMA [16] to do the cal­cu­la­tions for you, you will find that the soft­ware lim­its you to select­ing a sin­gle DC mea­sure from Table E.1, and this prin­ci­ple applies if you are doing the cal­cu­la­tions by hand too. Only one item from Table E.1 can be select­ed for a giv­en safe­ty func­tion.

Ranking DC

Once you have deter­mined the DC for a safe­ty func­tion, you need to com­pare the DC val­ue against [1, Table 5] to see if the DC is suf­fi­cient for the PLr you are try­ing to achieve. Table 5 bins the DC results into four ranges. Just like bin­ning the PFHd val­ues into five ranges helps to pre­vent pre­ci­sion bias in esti­mat­ing the prob­a­bil­i­ty of fail­ure of the com­plete sys­tem or safe­ty func­tion, the ranges in Table 5 helps to pre­vent pre­ci­sion bias in the cal­cu­lat­ed or select­ed DC val­ues.

ISO 13849-1, Table 5 Diagnostic coverage (DC)
ISO 13849–1, Table 5 Diag­nos­tic cov­er­age (DC)

If the DC val­ue was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add addi­tion­al diag­nos­tic fea­tures so that you can either select a high­er cov­er­age from [1, Table E.1] or cal­cu­late a high­er val­ue using [14].

Multiple safety functions

When you have mul­ti­ple safe­ty func­tions that make up a com­plete safe­ty sys­tem, for exam­ple, an emer­gency stop func­tion and a guard inter­lock­ing func­tion, the DC val­ues need to be aver­aged to deter­mine the over­all DC for the com­plete sys­tem. [1, Annex E] pro­vides you with a method to do this in Equa­tion E.1.

Equation for averaging the DC values of multiple safety functions
ISO 13849–1-2015 Equa­tion E.1

Plug in the val­ues for MTTFD and DC for each safe­ty func­tion, and cal­cu­late the result­ing DCavg val­ue for the com­plete sys­tem.

That’s it for this arti­cle. The next part will cov­er Com­mon Cause Fail­ures (CCF). Look for it on 20-Mar-17!

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book, 3rd Ed. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. 7 parts. IEC Stan­dard 61508. Edi­tion 2. 2010.

[14]   Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems — Part 2: Require­ments for electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. IEC Stan­dard 61508–2. 2010.

[15]     Reli­a­bil­i­ty Pre­dic­tion of Elec­tron­ic Equip­ment. Mil­i­tary Hand­book MIL-HDBK-217F. 1991.

[16]     “IFA — Prac­ti­cal aids: Soft­ware-Assis­tent SISTEMA: Safe­ty Integri­ty — Soft­ware Tool for the Eval­u­a­tion of Machine Appli­ca­tions”, Dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].

Digiprove sealCopy­right secured by Digiprove © 2017
Acknowl­edge­ments: IEC and ISO as cit­ed
Some Rights Reserved