ISO 13849–1 Analysis — Part 4: MTTFD — Mean Time to Dangerous Failure

This entry is part 4 of 9 in the series How to do a 13849–1 analy­sis

Func­tion­al safe­ty is all about the like­li­hood of a safe­ty sys­tem fail­ing to oper­ate when you need it. Under­stand­ing Mean Time to Dan­ger­ous Fail­ure, or MTTFD, is crit­i­cal. If you have been read­ing about this top­ic at all, you may notice that I am abbre­vi­at­ing Mean Time to Dan­ger­ous Fail­ure with all cap­i­tal let­ters. Using MTTFD is a recent change that occurred in the third edi­tion of ISO 13849–1, pub­lished in 2015. In the first and sec­ond edi­tions, the cor­rect abbre­vi­a­tion was MTTFd. Onward!

If you missed the third instal­ment in this series, you can read it here.

Defining MTTFD

Let’s start by hav­ing a look at some key def­i­n­i­tions. Look­ing at [1, Cl. 3], you will find:

3.1.1 safety–related part of a con­trol sys­tem (SRP/CS)—part of a con­trol sys­tem that responds to safe­ty-relat­ed input sig­nals and gen­er­ates safe­ty-relat­ed
out­put sig­nals

Note 1 to entry: The com­bined safe­ty-relat­ed parts of a con­trol sys­tem start at the point where the safe­ty-relat­ed input sig­nals are ini­ti­at­ed (includ­ing, for exam­ple, the actu­at­ing cam and the roller of the posi­tion switch) and end at the out­put of the pow­er con­trol ele­ments (includ­ing, for exam­ple, the main con­tacts of a con­tac­tor)

Note 2 to entry: If mon­i­tor­ing sys­tems are used for diag­nos­tics, they are also con­sid­ered as SRP/CS.

3.1.5 dan­ger­ous fail­ure—fail­ure which has the poten­tial to put the SRP/CS in a haz­ardous or fail-to-func­tion state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem;
in redun­dant sys­tems a dan­ger­ous hard­ware fail­ure is less like­ly to lead to the over­all dan­ger­ous or fail-tofunc­tion
state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, mod­i­fied.]

3.1.25 mean time to dan­ger­ous fail­ure (MTTFD)—expec­ta­tion of the mean time to dan­ger­ous fail­ure

Def­i­n­i­tion 3.1.5 is pret­ty help­ful, but def­i­n­i­tion 3.1.25 is, well, not much of a def­i­n­i­tion. Let’s look at this anoth­er way.

Failures and Faults

Since every­thing can and will even­tu­al­ly fail to per­form the way we expect it to, we know that every­thing has a fail­ure rate because every­thing takes some time to fail. Grant­ed that this time may be very short, like the first time the unit is turned on, or it may be very long, some­times hun­dreds of years. Remem­ber that because this is a rate, it is some­thing that occurs over time. It is also impor­tant to be clear that we are talk­ing about fail­ures and not faults. Read­ing from [1]:

3.1.3 fault—state of an item char­ac­ter­ized by the inabil­i­ty to per­form a required func­tion, exclud­ing the inabil­i­ty dur­ing pre­ven­tive main­te­nance or oth­er planned actions, or due to lack of exter­nal resources

Note 1 to entry: A fault is often the result of a fail­ure of the item itself, but may exist with­out pri­or fail­ure.

Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault.
[SOURCE: IEC 60050?191:1990, 05–01.]

3.1.4 fail­ure— ter­mi­na­tion of the abil­i­ty of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Fail­ure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The con­cept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Fail­ures which only affect the avail­abil­i­ty of the process under con­trol are out­side of the scope of this part of ISO 13849.
[SOURCE: IEC 60050–191:1990, 04–01.]

3.1.4 Note 2 is the impor­tant one at this point in the dis­cus­sion.

Now, where we have mul­ti­ples of some­thing, like relays, valves, or safe­ty sys­tems, we now have a pop­u­la­tion of iden­ti­cal items, each of which will even­tu­al­ly fail at some point. We can count those fail­ures as they occur and tal­ly them up, and we can graph how many fail­ures we get in the pop­u­la­tion over time. If this is start­ing to sound sus­pi­cious­ly like sta­tis­tics to you, that is because it is.

OK, so let’s look at the kinds of fail­ures that occur in that pop­u­la­tion. Some fail­ures will result in a “safe” state, e.g., a relay fail­ing with all poles open, and some will fail in a poten­tial­ly “dan­ger­ous” state, like a nor­mal­ly closed valve devel­op­ing a sig­nif­i­cant leak. If we tal­ly up all the fail­ures that occur, and then tal­ly the num­ber of “safe” fail­ures and the num­ber of “dan­ger­ous” fail­ures in that pop­u­la­tion, we now have some very use­ful infor­ma­tion.

The dif­fer­ent kinds of fail­ures are sig­ni­fied using the low­er­case Greek let­ter \lambda (lamb­da). We can add some sub­scripts to help iden­ti­fy what kinds of fail­ures we are talk­ing about. The com­mon vari­able des­ig­na­tions used are [14]:

\lambda = fail­ures
\lambda_{(t)} = fail­ure rate
\lambda_s = “safe” fail­ures
\lambda_d = “dan­ger­ous” fail­ures
\lambda_{dd} = detectable “dan­ger­ous” fail­ures
\lambda_{du} = unde­tectable “dan­ger­ous” fail­ures

I will be dis­cussing some of these vari­ables in more detail in a lat­er part of the series when I delve into Diag­nos­tic Cov­er­age, so don’t wor­ry about them too much just yet.

Getting to MTTFD

Since we can now start to deal with the fail­ure rate data math­e­mat­i­cal­ly, we can start to do some cal­cu­la­tions about expect­ed life­time of a com­po­nent or a sys­tem. That expect­ed, or prob­a­ble, life­time is what def­i­n­i­tion 3.1.25 was on about, and is what we call MTTFD.

MTTFD is the time in years over which the prob­a­bil­i­ty of fail­ure is rel­a­tive­ly con­stant. If you look at a typ­i­cal fail­ure rate curve, called a “bath­tub curve” due to its resem­blance to the pro­file of a nice soak­er tub, the MTTFD is the flat­ter por­tion of the curve between the end of the infant mor­tal­i­ty peri­od and the wear-out peri­od at the end of life. This part of the curve is the por­tion assumed to be includ­ed in the “mis­sion time” for the prod­uct. ISO 13849–1 assumes the mis­sion time for all machin­ery is 20 years [1, 4.5.4] and [1, Cl. 10].

Diagram of a standardized bathtub-shaped failure rate curve.
Fig­ure 1 — Typ­i­cal Bath­tub Curve [15]
ISO 13849–1 pro­vides us with guid­ance on how MTTFD relates to the deter­mi­na­tion of the PL in [1, Cl. 4.5.2]. MTTFD is fur­ther grouped into three bands as shown in [1, Table 4].
Table showing the bands of Mean time to dangerous failure of each channel (MTTFD)

The notes for this table are impor­tant as well. Since you can’t read the notes par­tic­u­lar­ly well in the table above, I’ve repro­duced them here:

NOTE 1 The choice of the MTTFD ranges of each chan­nel is based on fail­ure rates found in the field as state-of-the-art, form­ing a kind of log­a­rith­mic scale fit­ting to the log­a­rith­mic PL scale. An MTTFD val­ue of each chan­nel less than three years is not expect­ed to be found for real SRP/CS since this would mean that after one year about 30 % of all sys­tems on the mar­ket will fail and will need to be replaced. An MTTFD val­ue of each chan­nel greater than 100 years is not accept­able because SRP/CS for high risks should not depend on the reli­a­bil­i­ty of com­po­nents alone. To rein­force the SRP/CS against sys­tem­at­ic and ran­dom fail­ure, addi­tion­al means such as redun­dan­cy and test­ing should be required. To be prac­ti­ca­ble, the num­ber of ranges was restrict­ed to three. The lim­i­ta­tion of MTTFD of each chan­nel val­ues to a max­i­mum of 100 years refers to the sin­gle chan­nel of the SRP/CS which car­ries out the safe­ty func­tion. High­er MTTFD val­ues can be used for sin­gle com­po­nents (see Table D.1).

NOTE 2 The indi­cat­ed bor­ders of this table are assumed with­in an accu­ra­cy of 5%.

The stan­dard then tells us to select the MTTFD using a sim­ple hier­ar­chy [1, 4.5.2]:

For the esti­ma­tion ofMT­TFD of a com­po­nent, the hier­ar­chi­cal pro­ce­dure for find­ing data shall be, in the order giv­en:

a) use manufacturer’s data;
b) use meth­ods in Annex C and Annex D;
c) choose 10 years.

Why ten years? Ten years is half of the assumed mis­sion life­time of 20 years. More on mis­sion life­time in a lat­er post.

Look­ing at [1, Annex C.2], you will find the “Good Engi­neer­ing Prac­tices” method for esti­mat­ing MTTFD, pre­sum­ing the man­u­fac­tur­er has not pro­vid­ed you with that infor­ma­tion. ISO 13849–2 [2] has some ref­er­ence tables that pro­vide some gen­er­al MTTFD val­ues for some kinds of com­po­nents, but not every part that exists can be list­ed. How can we deal with parts not list­ed? [1, Annex C.4] pro­vides us with a cal­cu­la­tion method for esti­mat­ing MTTFD for pneu­mat­ic, mechan­i­cal and electro­mechan­i­cal com­po­nents.

Calculating MTTFD for pneumatic, mechanical and electromechanical components

I need to intro­duce you to a few more vari­ables before we look at how to cal­cu­late MTTFD for a com­po­nent.

Vari­ables
Vari­able Descrip­tion
B10 Num­ber of cycles until 10% of the com­po­nents fail (for pneu­mat­ic and electro­mechan­i­cal com­po­nents)
B10D Num­ber of cycles until 10% of the com­po­nents fail dan­ger­ous­ly (for pneu­mat­ic and electro­mechan­i­cal com­po­nents)
T life­time of the com­po­nent
T10D the mean time until 10% of the com­po­nents fail dan­ger­ous­ly
hop is the mean oper­a­tion time, in hours per day;
dop is the mean oper­a­tion time, in days per year;
tcycle is the mean oper­a­tion time between the begin­ning of two suc­ces­sive cycles of the com­po­nent. (e.g., switch­ing of a valve) in sec­onds per cycle.
s sec­onds
h hours
a years

Know­ing a few details we can cal­cu­late the MTTFD using [1, Eqn C.1]. We need to know the fol­low­ing para­me­ters for the appli­ca­tion:

  • B10D
  • hop
  • dop
  • tcycle

Formula for calculating MTTFD - ISO 13849-1, Equation C.1
Cal­cu­lat­ing MTTFD — [1, Eqn. C.1]
In order to use [1, Eqn. C.1], we need to first cal­cu­late nop, using [1, Eqn. C.2]:

Formula for calculating nop - ISO 13849-1, Equation C.2.
Cal­cu­lat­ing nop — [1, Eqn. C.2]
We may also need one more cal­cu­la­tion, [1, Eqn. C.4]:
Calculating T10D using ISO 13849-1 Eqn. C.3
Cal­cu­lat­ing T10D — [1, Eqn. C.4]

Example Calculation [1, C.4.3]

For a pneu­mat­ic valve, a man­u­fac­tur­er deter­mines a mean val­ue of 60 mil­lion cycles as B10D. The valve is used for two shifts each day on 220 oper­a­tion days a year. The mean time between the begin­ning of two suc­ces­sive switch­ing of the valve is esti­mat­ed as 5 s. This yields the fol­low­ing val­ues:

  • dop of 220 days per year;
  • hop of 16 h per day;
  • tcycle of 5 s per cycle;
  • B10D of 60 mil­lion cycles.

Doing the math, we get:

Example C.4.3 calculations from, ISO 13849-1.
Exam­ple C.4.3

So there you have it, at least for a fair­ly sim­ple case. There are more exam­ples in ISO 13849–1, and I would encour­age you to work through them. You can also find a wealth of exam­ples in a report pro­duced by the BGIA in Ger­many, called the Func­tion­al safe­ty of machine con­trols (BGIA Report 2/2008e) [16]. The down­load for the report is linked from the ref­er­ence list at the end of this arti­cle. If you are a SISTEMA user, there are lots of exam­ples in the SISTEMA Cook­books, and there are exam­ple files avail­able so that you can see how to assem­ble the sys­tems in the soft­ware.

The next part of this series cov­ers Diag­nos­tic Cov­er­age (DC), and the aver­age DC for mul­ti­ple safe­ty func­tions in a sys­tem, DCavg.

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[2]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 2: Val­i­da­tion. 2nd Edi­tion. ISO Stan­dard 13849–2. 2012.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. 7 parts. IEC Stan­dard 61508. Sec­ond Edi­tion. 2010.

[14]    Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems – Part 4: Def­i­n­i­tions and abbre­vi­a­tions. IEC Stan­dard 61508–4. Sec­ond Edi­tion. 2010.

[15]    “The bath­tub curve and prod­uct fail­ure behav­ior part 1 of 2”, Findchart.co, 2017. [Online]. Avail­able: http://findchart.co/download.php?aHR0cDovL3d3dy53ZWlidWxsLmNvbS9ob3R3aXJlL2lzc3VlMjEvaHQyMV8xLmdpZg. [Accessed: 03- Jan- 2017].

[16]   “Func­tion­al safe­ty of machine con­trols — Appli­ca­tion of EN ISO 13849 (BGIA Report 2/2008e)”, dguv.de, 2017. [Online]. Avail­able: http://www.dguv.de/ifa/publikationen/reports-download/bgia-reports-2007-bis-2008/bgia-report-2–2008/index-2.jsp. [Accessed: 2017-01-04].

Digiprove sealCopy­right secured by Digiprove © 2017
Acknowl­edge­ments: IEC, ISO and oth­ers as cit­ed
Some Rights Reserved

ISO 13849–1 Analysis — Part 3: Architectural Category Selection

This entry is part 3 of 9 in the series How to do a 13849–1 analy­sis

At this point, you have com­plet­ed the risk assess­ment, assigned required Per­for­mance Lev­els to each safe­ty func­tion, and devel­oped the Safe­ty Require­ment Spec­i­fi­ca­tion for each safe­ty func­tion. Next, you need to con­sid­er three aspects of the sys­tem design: Archi­tec­tur­al Cat­e­go­ry, Chan­nel Mean Time to Dan­ger­ous Fail­ure (MTTFD), and Diag­nos­tic Cov­er­age (DCavg). In this part of the series, I am going to dis­cuss select­ing the archi­tec­tur­al cat­e­go­ry for the sys­tem.

If you missed the sec­ond instal­ment in this series, you can read it here.

Understanding Performance Levels

To under­stand ISO 13849–1, it helps to know a lit­tle about where the stan­dard orig­i­nat­ed. ISO 13849–1 is a sim­pli­fied method for deter­min­ing the reli­a­bil­i­ty of safe­ty-relat­ed con­trols for machin­ery. The basic ideas came from IEC 61508 [7], a sev­en-part stan­dard orig­i­nal­ly pub­lished in 1998. IEC 61508 brought for­ward the con­cept of the Aver­age Prob­a­bil­i­ty of Dan­ger­ous Fail­ure per Hour, PFHD (1/h). Dan­ger­ous fail­ures are those fail­ures that result in non-per­for­mance of the safe­ty func­tion, and which can­not be detect­ed by diag­nos­tics. Here’s the for­mal def­i­n­i­tion from [1]:

3.1.5

dan­ger­ous fail­ure
fail­ure which has the poten­tial to put the SRP/CS in a haz­ardous or fail-to-func­tion state

Note 1 to entry: Whether or not the poten­tial is realised can depend on the chan­nel archi­tec­ture of the sys­tem; in redun­dant sys­tems a dan­ger­ous hard­ware fail­ure is less like­ly to lead to the over­all dan­ger­ous or fail-to-func­tion state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, mod­i­fied.]

The Per­for­mance Lev­els are sim­ply bands of prob­a­bil­i­ties of Dan­ger­ous Fail­ures, as shown in [1, Table 2] below.

Table 2 from ISO 13849-2:2015 showing the five Performance levels and the corresponding ranges of PFHd values.
Per­for­mance Lev­els as bands of PFHd ranges

The ranges shown in [1, Table 2] are approx­i­mate. If you need to see the spe­cif­ic lim­its of the bands for any rea­son, see [1, Annex K] describes the full span of PFHD, in table for­mat.

There is anoth­er way to describe the same char­ac­ter­is­tics of a sys­tem, this one from IEC. Instead of using the PL sys­tem, IEC uses Safe­ty Integri­ty Lev­els (SILs). [1, Table 3] shows the cor­re­spon­dence between PLs and SILs. Note that the cor­re­spon­dence is not exact. Where the cal­cu­lat­ed PFHd is close to either end of one of the PL or SIL bands, use the table in [1, Annex K] or in [9] to deter­mine to which band(s) the per­for­mance should be assigned.

IEC pro­duced a Tech­ni­cal Report [10] that pro­vides guid­ance on how to use ISO 13849–1 or IEC 62061. The fol­low­ing table shows the rela­tion­ship between PLs, PFHd and SILs.

Table showing the correspondence between the PL, PFHd, and SIL.
IEC/TR 62061–1:2010, Table 1

IEC 61508 includes SIL 4, which is not shown in [10, Table 1] because this lev­el of per­for­mance exceeds the range of PFHD pos­si­ble using ISO 13849–1 tech­niques. Also, you may have noticed that PLb and PLc are both with­in SIL1. This was done to accom­mo­date the five archi­tec­tur­al cat­e­gories that came from EN 954–1 [12].

Why PL and not just PFHD? One of the odd things that humans do when we can cal­cu­late things is the devel­op­ment of what has been called “pre­ci­sion bias” [12]. Pre­ci­sion bias occurs when we can com­pute a num­ber that appears very pre­cise, e.g., 3.2 x 10-6, which then makes us feel like we have a very pre­cise con­cept of the quan­ti­ty. The prob­lem, at least in this case, is that we are deal­ing with prob­a­bil­i­ties and minus­cule prob­a­bil­i­ties at that. Using bands, like the PLs, forces us to “bin” these appar­ent­ly pre­cise num­bers into larg­er groups, elim­i­nat­ing the effects of pre­ci­sion bias in the eval­u­a­tion of the sys­tems. Elim­i­nat­ing pre­ci­sion bias is the same rea­son that IEC 61508 uses SILs — bin­ning the cal­cu­lat­ed val­ues helps to reduce our ten­den­cy to devel­op a pre­ci­sion bias. The real­i­ty is that we just can’t pre­dict the behav­iour of these sys­tems with as much pre­ci­sion as we would like to believe.

Getting to Performance Levels: MTTFD, Architectural Category and DC

Some aspects of the sys­tem design need to be con­sid­ered to arrive at a Per­for­mance Lev­el or make a pre­dic­tion about fail­ure rates in terms of PFHd.

First is the sys­tem archi­tec­ture: Fun­da­men­tal­ly, sin­gle chan­nel or two chan­nel. As a side note, if your sys­tem uses more than two chan­nels there are ways to han­dle this in ISO 13849–1 that are workarounds, or you can use IEC 62061 or IEC 61508, either of which will han­dle these more com­plex sys­tems more eas­i­ly. Remem­ber, ISO 13849–1 is intend­ed for rel­a­tive­ly sim­ple sys­tems.

When we get into the analy­sis in a lat­er arti­cle, we will be cal­cu­lat­ing or esti­mat­ing the Mean Time to Dan­ger­ous Fail­ure, MTTFD, of each chan­nel, and then of the entire sys­tem. MTTFD is expressed in years, unlike PFHd, which is expressed in frac­tion­al hours (1/h). I have yet to hear why this is the case as it seems rather con­fus­ing. How­ev­er, that is cur­rent prac­tice.

Architectural Categories

Once the required PL is known, the next step is the selec­tion of the archi­tec­tur­al cat­e­go­ry. The basic archi­tec­tur­al cat­e­gories were intro­duced ini­tial­ly in EN 954–1:1996 [12].  The Cat­e­gories were car­ried for­ward unchanged into the first edi­tion of ISO 13849–1 in 1999. The Cat­e­gories were main­tained and expand­ed to include addi­tion­al require­ments in the sec­ond and third edi­tions in 2005 and 2015.

Since I have explored the details of the archi­tec­tures in a pre­vi­ous series, I am not going to repeat that here. Instead, I will refer you to that series. The archi­tec­tur­al Cat­e­gories come in five flavours:

Archi­tec­ture Basics
Cat­e­go­ry Struc­ture Basic Require­ments Safe­ty Princ­ple
For full require­ments, see [1, Cl. 6]
B Sin­gle chan­nel Basic cir­cuit con­di­tions are met (i.e., com­po­nents are rat­ed for the cir­cuit volt­age and cur­rent, etc.) Use of com­po­nents that are designed and built to the rel­e­vant com­po­nent stan­dards. [1, 6.2.3] Com­po­nent selec­tion
1 Sin­gle chan­nel Cat­e­go­ry B plus the use of “well-tried com­po­nents” and “well-tried safe­ty prin­ci­ples” [1, 6.2.4] Com­po­nent selec­tion
2 Sin­gle chan­nel Cat­e­go­ry B plus the use of “well-tried safe­ty prin­ci­ples” and peri­od­ic test­ing [1, 4.5.4] of the safe­ty func­tion by the machine con­trol sys­tem. [1, 6.2.5] Sys­tem Struc­ture
3 Dual chan­nel Cat­e­go­ry B plus the use of “well-tried safe­ty prin­ci­ples” and no sin­gle fault shall lead to the loss of the safe­ty func­tion.

Where prac­ti­ca­ble, sin­gle faults shall be detect­ed. [1, 6.2.6]

Sys­tem Struc­ture
4 Dual chan­nel Cat­e­go­ry B plus the use of “well-tried safe­ty prin­ci­ples” and no sin­gle fault shall lead to the loss of the safe­ty func­tion.

Sin­gle faults are detect­ed at or before the next demand on the safe­ty sys­tem, but where this is not pos­si­ble an accu­mu­la­tion of unde­tect­ed faults will not lead to the loss of the safe­ty func­tion. [1, 6.2.7]

Sys­tem Struc­ture

[1, Table 10] pro­vides a more detailed sum­ma­ry of the require­ments than the sum­ma­ry table above pro­vides.

Since the Cat­e­gories can­not all achieve the same reli­a­bil­i­ty, the PL and the Cat­e­gories are linked as shown in [1, Fig. 5]. This dia­gram sum­maris­es te rela­tion­ship of the three cen­tral para­me­ters in ISO 13849–1 in one illus­tra­tion.

Figure relating Architectural Category, DC avg, MTTFD and PL.
Rela­tion­ship between cat­e­gories, DCavg, MTTFD of each chan­nel and PL

Start­ing with the PLr from the Safe­ty Require­ment Spec­i­fi­ca­tion for the first safe­ty func­tion, you can use Fig. 5 to help you select the Cat­e­go­ry and oth­er para­me­ters nec­es­sary for the design. For exam­ple, sup­pose that the risk assess­ment indi­cates that an emer­gency stop sys­tem is need­ed. ISO 13850 requires that emer­gency stop func­tions pro­vide a min­i­mum of PLc, so using this as the basis you can look at the ver­ti­cal axis in the dia­gram to find PLc, and then read across the fig­ure. You will see that PLc can be achieved using Cat­e­go­ry 1, 2, or 3 archi­tec­ture, each with cor­re­spond­ing dif­fer­ences in MTTFD and DCavg. For exam­ple:

  • Cat. 1, MTTFD = high and DCavg = none, or
  • Cat. 2, MTTFD = Medi­um to High and DCavg = Low to Medi­um, or
  • Cat. 3, MTTFD = Low to High and DCavg = Low to Medi­um.

As you can see, the MTTFD in the chan­nels decreas­es as the diag­nos­tic cov­er­age increas­es. The design com­pen­sates for low­er reli­a­bil­i­ty in the com­po­nents by increas­ing the diag­nos­tic cov­er­age and adding redun­dan­cy. Using [1, Fig. 5] you can pin down any of the para­me­ters and then select the oth­ers as appro­pri­ate.

One addi­tion­al point regard­ing Cat­e­go­ry 3 and 4: The dif­fer­ence between these Cat­e­gories is increased Diag­nos­tic Cov­er­age. While Cat­e­go­ry 3 is Sin­gle Fault Tol­er­ant, Cat­e­go­ry 4 has addi­tion­al diag­nos­tic capa­bil­i­ties so that addi­tion­al faults can­not lead to the loss of the safe­ty func­tion. This is not the same as being mul­ti­ple fault tol­er­ant, as the sys­tem is still designed to oper­ate in the pres­ence of only a sin­gle fault, it is sim­ply enhanced diag­nos­tic capa­bil­i­ty.

It is worth not­ing that ISO 13849 only recog­nis­es struc­tures with sin­gle or dual chan­nel con­fig­u­ra­tions. If you need to devel­op a sys­tem with more than sin­gle redun­dan­cy (i.e., more than two chan­nels), you can analyse each pair of chan­nels as a dual chan­nel archi­tec­ture, or you can move to using IEC 62061 or IEC 61508, either of which per­mits any lev­el of redun­dan­cy.

The next step in this process is the eval­u­a­tion of the com­po­nent and chan­nel MTTFD, and then the deter­mi­na­tion of the com­plete sys­tem MTTFD. Part 4 of this series pub­lish­es on 13-Feb-17.

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. ISO Stan­dard 13849–1. 2015.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. IEC Stan­dard 61508. 2nd Edi­tion. Sev­en Parts. 2010.

[9]      Safe­ty of machin­ery — Func­tion­al safe­ty of safe­ty-relat­ed elec­tri­cal, elec­tron­ic and pro­gram­ma­ble elec­tron­ic con­trol sys­tems. IEC Stan­dard 62061. 2005.

[10]    Guid­ance on the appli­ca­tion of ISO 13849–1 and IEC 62061 in the design of safe­ty-relat­ed con­trol sys­tems for machin­ery. IEC Tech­ni­cal Report 62061–1. 2010.

[11]    D. S. G. Nix, Y. Chin­ni­ah, F. Dosio, M. Fessler, F. Eng, and F. Schr­ev­er, “Link­ing Risk and Reliability—Mapping the out­put of risk assess­ment tools to func­tion­al safe­ty require­ments for safe­ty relat­ed con­trol sys­tems,” 2015.

[12]    Safe­ty of machin­ery. Safe­ty relat­ed parts of con­trol sys­tems. Gen­er­al prin­ci­ples for design. CEN Stan­dard EN 954–1. 1996.

Digiprove sealCopy­right secured by Digiprove © 2017
Acknowl­edge­ments: IEC and ISO as cit­ed.
Some Rights Reserved

ISO 13849–1 Analysis — Part 2: Safety Requirement Specification

This entry is part 2 of 9 in the series How to do a 13849–1 analy­sis

Developing the Safety Requirement Specification

The Safe­ty Require­ment Spec­i­fi­ca­tion sounds pret­ty heavy, but actu­al­ly, it is just a big name for a way to organ­ise the infor­ma­tion you need to have to analyse and design the safe­ty sys­tems for your machin­ery. Note that I am assum­ing that you are doing this in the “right” order, mean­ing that you are plan­ning the design before­hand, rather than try­ing to back-fill the doc­u­men­ta­tion after com­plet­ing the design. In either case, the process is the same, but get­ting the infor­ma­tion you need can be much hard­er after the fact, than before the doing the design work. Doing some aspects in a review mode is impos­si­ble, espe­cial­ly if a third par­ty to whom you have no access did the design work [8].

If you missed the first instal­ment in this series, you can read it here.

What goes into a Safety Requirements Specification?

For ref­er­ence, chap­ter 5 of ISO 13849–1 [1] cov­ers safe­ty require­ment spec­i­fi­ca­tions to some degree, but it needs some clar­i­fi­ca­tion I think. First of all, what is a safe­ty func­tion?

Safe­ty func­tions include any func­tion of the machine that has a direct pro­tec­tive effect for the work­er using the machin­ery. How­ev­er, using this def­i­n­i­tion, it is pos­si­ble to ignore some impor­tant func­tions. Com­ple­men­tary pro­tec­tive mea­sures, like emer­gency stop, can be missed because they are usu­al­ly “after the fact”, i.e., the injury occurs, and then the E-stop is pressed, so you can­not say that it has a “direct pro­tec­tive effect”. If we look at the def­i­n­i­tions in [1], we find:

3.1.20

safe­ty func­tion

func­tion of the machine whose fail­ure can result in an imme­di­ate increase of the risk(s)
[SOURCE: ISO 12100:2010, 3.30.]

Linking Risk to Functional Safety

Refer­ring to the risk assess­ment, any risk con­trol that pro­tects work­ers from some aspect of the machine oper­a­tion using a con­trol func­tion like an inter­locked gate, or by main­tain­ing a tem­per­a­ture below a crit­i­cal lev­el or speed at a safe lev­el, is a safe­ty func­tion. For exam­ple: if the tem­per­a­ture in a process ris­es too high, the process will explode; or if a shaft speed is too high (or too low) the tool may shat­ter and eject bro­ken pieces at high speed. There­fore, the tem­per­a­ture con­trol func­tion and the speed con­trol func­tion are safe­ty func­tions. These func­tions may also be process con­trol func­tions, but the poten­tial for an imme­di­ate increase in risk due to a fail­ure is what makes these func­tions safe­ty func­tions no mat­ter what else they may do.

[1, Table 8] gives you some exam­ples of var­i­ous kinds of safe­ty func­tions found on machines. The table is not inclu­sive — mean­ing there are many more safe­ty func­tions out there than are list­ed in the table. Your job is to fig­ure out which ones live in your machine. It is a bit like Poke­mon — ya got­ta catch ‘em all!

Basic Safety Requirement Specification

Each safe­ty func­tion must have a Per­for­mance Lev­el or a Safe­ty Integri­ty Lev­el assigned as part of the risk assess­ment. For each safe­ty func­tion, you need to devel­op the fol­low­ing infor­ma­tion:

Basic Safe­ty Require­ment Spec­i­fi­ca­tion
Item Descrip­tion
Safe­ty Func­tion Iden­ti­fi­ca­tion Name or oth­er ref­er­ences, e.g. “Access Gate Inter­lock” or “Haz­ard Zone 2.”
Func­tion­al Char­ac­ter­is­tics
  • Intend­ed use or fore­see­able mis­use of the machine rel­e­vant to the safe­ty func­tion
  • Oper­at­ing modes rel­e­vant to the safe­ty func­tion
  • Cycle time of the machine
  • Response time of the safe­ty func­tion
Emer­gency Oper­a­tion Is this an emer­gency oper­a­tion func­tion? If yes, what types of emer­gen­cies might be mit­i­gat­ed by this func­tion?
Inter­ac­tions What oper­at­ing modes require this func­tion to be oper­a­tional? Are there modes where this func­tion requires delib­er­ate bypass? These could include nor­mal work­ing modes (auto­mat­ic, man­u­al, set-up, changeover), and fault-find­ing or main­te­nance modes.
Behav­iour How you want the sys­tem to behave when the safe­ty func­tion is trig­gered, i.e., Pow­er is imme­di­ate­ly removed from the MIG welder using an IEC 60204–1 Cat­e­go­ry 0 stop func­tion, and robot motions are stopped using IEC 60204–1 Cate­go­ry 1 stop func­tion through the robot safe­ty stop input.

or

All hor­i­zon­tal pneu­mat­ic motions stop in their cur­rent posi­tions. Ver­ti­cal motions return to the raised or retract­ed posi­tions.

Also to be con­sid­ered is a pow­er loss con­di­tion. Should the sys­tem behave in the same way as if the safe­ty func­tion was trig­gered, not react at all, or do some­thing else? Con­sid­er ver­ti­cal axes that might require hold­ing brakes or oth­er mech­a­nisms to pre­vent pow­er loss caus­ing unex­pect­ed motion.

Machine State after trig­ger­ing What is the expect­ed state of the machine after trig­ger­ing the safe­ty func­tion? What is the recov­ery process?
Fre­quen­cy of Oper­a­tion How often do you expect this safe­ty func­tion to be used? A rea­son­able esti­mate is need­ed. More on this below.
Pri­or­i­ty of Oper­a­tion If simul­ta­ne­ous trig­ger­ing of mul­ti­ple safe­ty func­tions is pos­si­ble, which function(s) takes prece­dence? E.g., Emer­gency Stop always takes prece­dence over every­thing else. What hap­pens if you have a safe speed func­tion and a guard inter­lock that are asso­ci­at­ed because the inter­lock is part of a guard­ing func­tion cov­er­ing a shaft, and you need to trou­bleshoot the safe speed func­tion, so you need access to the shaft where the encoders are mount­ed?
Required Per­for­mance Lev­el I sug­gest record­ing the S, F, and P val­ues select­ed as well as the PLr val­ue select­ed for lat­er ref­er­ence.

Here’s an exam­ple table in MS Word for­mat that you can use as a start­ing point for your SRS doc­u­ments. Note that SRS can be much more detailed than this. If you want more infor­ma­tion on this, read IEC 61508–1, 7.10.2.

So, that is the min­i­mum. You can add lots more infor­ma­tion to the min­i­mum require­ments, but this will get you start­ed. If you want more infor­ma­tion on devel­op­ing the SRS, you will need to get a copy of IEC 61508 [7].

What’s Next?

Next, you need to be able to make some design deci­sions about sys­tem archi­tec­ture and com­po­nents. Cir­cuit archi­tec­tures have been dis­cussed at some length on the MS101 blog in the past, so I am not going to go through them again in this series. Instead, I will show you how to choose an archi­tec­ture based on your design goals in the next instal­ment. In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0]     B. Main, Risk Assess­ment: Basics and Bench­marks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simp­son, Safe­ty crit­i­cal sys­tems hand­book. Ams­ter­dam: Else­vier/But­ter­worth-Heine­mann, 2011.

[0.2]  Elec­tro­mag­net­ic Com­pat­i­bil­i­ty for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: The Insti­tu­tion of Engi­neer­ing and Tech­nol­o­gy, 2008.

[0.3]  Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 1st ed. Steve­nage, UK: Overview of tech­niques and mea­sures relat­ed to EMC for Func­tion­al Safe­ty, 2013.

References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Includ­ed in the last post of the series is the com­plete ref­er­ence list.

[1]     Safe­ty of machin­ery — Safe­ty-relat­ed parts of con­trol sys­tems — Part 1: Gen­er­al prin­ci­ples for design. 3rd Edi­tion. ISO Stan­dard 13849–1. 2015.

[7]     Func­tion­al safe­ty of electrical/electronic/programmable elec­tron­ic safe­ty-relat­ed sys­tems. Sev­en parts. IEC Stan­dard 61508. Edi­tion 2. 2010.

[8]     S. Joce­lyn, J. Bau­doin, Y. Chin­ni­ah, and P. Char­p­en­tier, “Fea­si­bil­i­ty study and uncer­tain­ties in the val­i­da­tion of an exist­ing safe­ty-relat­ed con­trol cir­cuit with the ISO 13849–1:2006 design stan­dard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.