## ISO 13849 – 1 Analysis — Part 3: Architectural Category Selection

This entry is part 3 of 9 in the series How to do a 13849 – 1 ana­lys­is

At this point, you have com­pleted the risk assess­ment, assigned required Performance Levels to each safety func­tion, and developed the Safety Requirement Specification for each safety func­tion. Next, you need to con­sider three aspects of the sys­tem design: Architectural Category, Channel Mean Time to Dangerous Failure (MTTFD), and Diagnostic Coverage (DCavg). In this part of the series, I am going to dis­cuss select­ing the archi­tec­tur­al cat­egory for the sys­tem.

If you missed the second instal­ment in this series, you can read it here.

## Understanding Performance Levels

To under­stand ISO 13849 – 1, it helps to know a little about where the stand­ard ori­gin­ated. ISO 13849 – 1 is a sim­pli­fied meth­od for determ­in­ing the reli­ab­il­ity of safety-​related con­trols for machinery. The basic ideas came from IEC 61508 [7], a seven-​part stand­ard ori­gin­ally pub­lished in 1998. IEC 61508 brought for­ward the concept of the Average Probability of Dangerous Failure per Hour, PFHD (1/​h). Dangerous fail­ures are those fail­ures that res­ult in non-​performance of the safety func­tion, and which can­not be detec­ted by dia­gnostics. Here’s the form­al defin­i­tion from [1]:

3.1.5

dan­ger­ous fail­ure
fail­ure which has the poten­tial to put the SRP/​CS in a haz­ard­ous or fail-​to-​function state

Note 1 to entry: Whether or not the poten­tial is real­ised can depend on the chan­nel archi­tec­ture of the sys­tem; in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-​to-​function state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, mod­i­fied.]

The Performance Levels are simply bands of prob­ab­il­it­ies of Dangerous Failures, as shown in [1, Table 2] below.

The ranges shown in [1, Table 2] are approx­im­ate. If you need to see the spe­cif­ic lim­its of the bands for any reas­on, see [1, Annex K] describes the full span of PFHD, in table format.

There is anoth­er way to describe the same char­ac­ter­ist­ics of a sys­tem, this one from IEC. Instead of using the PL sys­tem, IEC uses Safety Integrity Levels (SILs). [1, Table 3] shows the cor­res­pond­ence between PLs and SILs. Note that the cor­res­pond­ence is not exact. Where the cal­cu­lated PFHd is close to either end of one of the PL or SIL bands, use the table in [1, Annex K] or in [9] to determ­ine to which band(s) the per­form­ance should be assigned.

IEC pro­duced a Technical Report [10] that provides guid­ance on how to use ISO 13849 – 1 or IEC 62061. The fol­low­ing table shows the rela­tion­ship between PLs, PFHd and SILs.

IEC 61508 includes SIL 4, which is not shown in [10, Table 1] because this level of per­form­ance exceeds the range of PFHD pos­sible using ISO 13849 – 1 tech­niques. Also, you may have noticed that PLb and PLc are both with­in SIL1. This was done to accom­mod­ate the five archi­tec­tur­al cat­egor­ies that came from EN 954 – 1 [12].

Why PL and not just PFHD? One of the odd things that humans do when we can cal­cu­late things is the devel­op­ment of what has been called “pre­ci­sion bias” [12]. Precision bias occurs when we can com­pute a num­ber that appears very pre­cise, e.g., 3.2 x 10-6, which then makes us feel like we have a very pre­cise concept of the quant­ity. The prob­lem, at least in this case, is that we are deal­ing with prob­ab­il­it­ies and minus­cule prob­ab­il­it­ies at that. Using bands, like the PLs, forces us to “bin” these appar­ently pre­cise num­bers into lar­ger groups, elim­in­at­ing the effects of pre­ci­sion bias in the eval­u­ation of the sys­tems. Eliminating pre­ci­sion bias is the same reas­on that IEC 61508 uses SILs – bin­ning the cal­cu­lated val­ues helps to reduce our tend­ency to devel­op a pre­ci­sion bias. The real­ity is that we just can’t pre­dict the beha­viour of these sys­tems with as much pre­ci­sion as we would like to believe.

## Getting to Performance Levels: MTTFD, Architectural Category and DC

Some aspects of the sys­tem design need to be con­sidered to arrive at a Performance Level or make a pre­dic­tion about fail­ure rates in terms of PFHd.

First is the sys­tem archi­tec­ture: Fundamentally, single chan­nel or two chan­nel. As a side note, if your sys­tem uses more than two chan­nels there are ways to handle this in ISO 13849 – 1 that are work­arounds, or you can use IEC 62061 or IEC 61508, either of which will handle these more com­plex sys­tems more eas­ily. Remember, ISO 13849 – 1 is inten­ded for rel­at­ively simple sys­tems.

When we get into the ana­lys­is in a later art­icle, we will be cal­cu­lat­ing or estim­at­ing the Mean Time to Dangerous Failure, MTTFD, of each chan­nel, and then of the entire sys­tem. MTTFD is expressed in years, unlike PFHd, which is expressed in frac­tion­al hours (1/​h). I have yet to hear why this is the case as it seems rather con­fus­ing. However, that is cur­rent prac­tice.

### Architectural Categories

Once the required PL is known, the next step is the selec­tion of the archi­tec­tur­al cat­egory. The basic archi­tec­tur­al cat­egor­ies were intro­duced ini­tially in EN 954 – 1:1996 [12].  The Categories were car­ried for­ward unchanged into the first edi­tion of ISO 13849 – 1 in 1999. The Categories were main­tained and expan­ded to include addi­tion­al require­ments in the second and third edi­tions in 2005 and 2015.

Since I have explored the details of the archi­tec­tures in a pre­vi­ous series, I am not going to repeat that here. Instead, I will refer you to that series. The archi­tec­tur­al Categories come in five fla­vours:

Architecture Basics
Category Structure Basic Requirements Safety Princple
For full require­ments, see [1, Cl. 6]
B Single chan­nel Basic cir­cuit con­di­tions are met (i.e., com­pon­ents are rated for the cir­cuit voltage and cur­rent, etc.) Use of com­pon­ents that are designed and built to the rel­ev­ant com­pon­ent stand­ards. [1, 6.2.3] Component selec­tion
1 Single chan­nel Category B plus the use of “well-​tried com­pon­ents” and “well-​tried safety prin­ciples” [1, 6.2.4] Component selec­tion
2 Single chan­nel Category B plus the use of “well-​tried safety prin­ciples” and peri­od­ic test­ing [1, 4.5.4] of the safety func­tion by the machine con­trol sys­tem. [1, 6.2.5] System Structure
3 Dual chan­nel Category B plus the use of “well-​tried safety prin­ciples” and no single fault shall lead to the loss of the safety func­tion.

Where prac­tic­able, single faults shall be detec­ted. [1, 6.2.6]

System Structure
4 Dual chan­nel Category B plus the use of “well-​tried safety prin­ciples” and no single fault shall lead to the loss of the safety func­tion.

Single faults are detec­ted at or before the next demand on the safety sys­tem, but where this is not pos­sible an accu­mu­la­tion of undetec­ted faults will not lead to the loss of the safety func­tion. [1, 6.2.7]

System Structure

[1, Table 10] provides a more detailed sum­mary of the require­ments than the sum­mary table above provides.

Since the Categories can­not all achieve the same reli­ab­il­ity, the PL and the Categories are linked as shown in [1, Fig. 5]. This dia­gram sum­mar­ises te rela­tion­ship of the three cent­ral para­met­ers in ISO 13849 – 1 in one illus­tra­tion.

Starting with the PLr from the Safety Requirement Specification for the first safety func­tion, you can use Fig. 5 to help you select the Category and oth­er para­met­ers neces­sary for the design. For example, sup­pose that the risk assess­ment indic­ates that an emer­gency stop sys­tem is needed. ISO 13850 requires that emer­gency stop func­tions provide a min­im­um of PLc, so using this as the basis you can look at the ver­tic­al axis in the dia­gram to find PLc, and then read across the fig­ure. You will see that PLc can be achieved using Category 1, 2, or 3 archi­tec­ture, each with cor­res­pond­ing dif­fer­ences in MTTFD and DCavg. For example:

• Cat. 1, MTTFD = high and DCavg = none, or
• Cat. 2, MTTFD = Medium to High and DCavg = Low to Medium, or
• Cat. 3, MTTFD = Low to High and DCavg = Low to Medium.

As you can see, the MTTFD in the chan­nels decreases as the dia­gnost­ic cov­er­age increases. The design com­pensates for lower reli­ab­il­ity in the com­pon­ents by increas­ing the dia­gnost­ic cov­er­age and adding redund­ancy. Using [1, Fig. 5] you can pin down any of the para­met­ers and then select the oth­ers as appro­pri­ate.

One addi­tion­al point regard­ing Category 3 and 4: The dif­fer­ence between these Categories is increased Diagnostic Coverage. While Category 3 is Single Fault Tolerant, Category 4 has addi­tion­al dia­gnost­ic cap­ab­il­it­ies so that addi­tion­al faults can­not lead to the loss of the safety func­tion. This is not the same as being mul­tiple fault tol­er­ant, as the sys­tem is still designed to oper­ate in the pres­ence of only a single fault, it is simply enhanced dia­gnost­ic cap­ab­il­ity.

It is worth not­ing that ISO 13849 only recog­nises struc­tures with single or dual chan­nel con­fig­ur­a­tions. If you need to devel­op a sys­tem with more than single redund­ancy (i.e., more than two chan­nels), you can ana­lyse each pair of chan­nels as a dual chan­nel archi­tec­ture, or you can move to using IEC 62061 or IEC 61508, either of which per­mits any level of redund­ancy.

The next step in this pro­cess is the eval­u­ation of the com­pon­ent and chan­nel MTTFD, and then the determ­in­a­tion of the com­plete sys­tem MTTFD. Part 4 of this series pub­lishes on 13-​Feb-​17.

In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[1]     Safety of machinery — Safety-​related parts of con­trol sys­tems — Part 1: General prin­ciples for design. ISO Standard 13849 – 1. 2015.

[7]     Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems. IEC Standard 61508. 2nd Edition. Seven Parts. 2010.

Acknowledgements: IEC and ISO as cited.
Some Rights Reserved

## ISO 13849 – 1 Analysis — Part 4: MTTFD – Mean Time to Dangerous Failure

This entry is part 4 of 9 in the series How to do a 13849 – 1 ana­lys­is

Functional safety is all about the like­li­hood of a safety sys­tem fail­ing to oper­ate when you need it. Understanding Mean Time to Dangerous Failure, or MTTFD, is crit­ic­al. If you have been read­ing about this top­ic at all, you may notice that I am abbre­vi­at­ing Mean Time to Dangerous Failure with all cap­it­al let­ters. Using MTTFD is a recent change that occurred in the third edi­tion of ISO 13849 – 1, pub­lished in 2015. In the first and second edi­tions, the cor­rect abbre­vi­ation was MTTFd. Onward!

If you missed the third instal­ment in this series, you can read it here.

## Defining MTTFD

Let’s start by hav­ing a look at some key defin­i­tions. Looking at [1, Cl. 3], you will find:

3.1.1 safety – related part of a con­trol sys­tem (SRP/​CS)—part of a con­trol sys­tem that responds to safety-​related input sig­nals and gen­er­ates safety-​related
out­put sig­nals

Note 1 to entry: The com­bined safety-​related parts of a con­trol sys­tem start at the point where the safety-​related input sig­nals are ini­ti­ated (includ­ing, for example, the actu­at­ing cam and the roller of the pos­i­tion switch) and end at the out­put of the power con­trol ele­ments (includ­ing, for example, the main con­tacts of a con­tact­or)

Note 2 to entry: If mon­it­or­ing sys­tems are used for dia­gnostics, they are also con­sidered as SRP/​CS.

3.1.5 dan­ger­ous fail­ure—fail­ure which has the poten­tial to put the SRP/​CS in a haz­ard­ous or fail-​to-​function state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem;
in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-​tofunction
state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, mod­i­fied.]

3.1.25 mean time to dan­ger­ous fail­ure (MTTFD)—expect­a­tion of the mean time to dan­ger­ous fail­ure

Definition 3.1.5 is pretty help­ful, but defin­i­tion 3.1.25 is, well, not much of a defin­i­tion. Let’s look at this anoth­er way.

## Failures and Faults

Since everything can and will even­tu­ally fail to per­form the way we expect it to, we know that everything has a fail­ure rate because everything takes some time to fail. Granted that this time may be very short, like the first time the unit is turned on, or it may be very long, some­times hun­dreds of years. Remember that because this is a rate, it is some­thing that occurs over time. It is also import­ant to be clear that we are talk­ing about fail­ures and not faults. Reading from [1]:

3.1.3 fault—state of an item char­ac­ter­ized by the inab­il­ity to per­form a required func­tion, exclud­ing the inab­il­ity dur­ing pre­vent­ive main­ten­ance or oth­er planned actions, or due to lack of extern­al resources

Note 1 to entry: A fault is often the res­ult of a fail­ure of the item itself, but may exist without pri­or fail­ure.

Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault.
[SOURCE: IEC 60050?191:1990, 05 – 01.]

3.1.4 fail­ure— ter­min­a­tion of the abil­ity of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Failure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Failures which only affect the avail­ab­il­ity of the pro­cess under con­trol are out­side of the scope of this part of ISO 13849.
[SOURCE: IEC 60050 – 191:1990, 04 – 01.]

3.1.4 Note 2 is the import­ant one at this point in the dis­cus­sion.

Now, where we have mul­tiples of some­thing, like relays, valves, or safety sys­tems, we now have a pop­u­la­tion of identic­al items, each of which will even­tu­ally fail at some point. We can count those fail­ures as they occur and tally them up, and we can graph how many fail­ures we get in the pop­u­la­tion over time. If this is start­ing to sound sus­pi­ciously like stat­ist­ics to you, that is because it is.

OK, so let’s look at the kinds of fail­ures that occur in that pop­u­la­tion. Some fail­ures will res­ult in a “safe” state, e.g., a relay fail­ing with all poles open, and some will fail in a poten­tially “dan­ger­ous” state, like a nor­mally closed valve devel­op­ing a sig­ni­fic­ant leak. If we tally up all the fail­ures that occur, and then tally the num­ber of “safe” fail­ures and the num­ber of “dan­ger­ous” fail­ures in that pop­u­la­tion, we now have some very use­ful inform­a­tion.

The dif­fer­ent kinds of fail­ures are sig­ni­fied using the lower­case Greek let­ter $\lambda$ (lambda). We can add some sub­scripts to help identi­fy what kinds of fail­ures we are talk­ing about. The com­mon vari­able des­ig­na­tions used are [14]:

$\lambda$ = fail­ures
$\lambda_{(t)}$= fail­ure rate
$\lambda_s$ = “safe” fail­ures
$\lambda_d$ = “dan­ger­ous” fail­ures
$\lambda_{dd}$ = detect­able “dan­ger­ous” fail­ures
$\lambda_{du}$ = undetect­able “dan­ger­ous” fail­ures

I will be dis­cuss­ing some of these vari­ables in more detail in a later part of the series when I delve into Diagnostic Coverage, so don’t worry about them too much just yet.

## Getting to MTTFD

Since we can now start to deal with the fail­ure rate data math­em­at­ic­ally, we can start to do some cal­cu­la­tions about expec­ted life­time of a com­pon­ent or a sys­tem. That expec­ted, or prob­able, life­time is what defin­i­tion 3.1.25 was on about, and is what we call MTTFD.

MTTFD is the time in years over which the prob­ab­il­ity of fail­ure is rel­at­ively con­stant. If you look at a typ­ic­al fail­ure rate curve, called a “bathtub curve” due to its resemb­lance to the pro­file of a nice soak­er tub, the MTTFD is the flat­ter por­tion of the curve between the end of the infant mor­tal­ity peri­od and the wear-​out peri­od at the end of life. This part of the curve is the por­tion assumed to be included in the “mis­sion time” for the product. ISO 13849 – 1 assumes the mis­sion time for all machinery is 20 years [1, 4.5.4] and [1, Cl. 10].

ISO 13849 – 1 provides us with guid­ance on how MTTFD relates to the determ­in­a­tion of the PL in [1, Cl. 4.5.2]. MTTFD is fur­ther grouped into three bands as shown in [1, Table 4].

The notes for this table are import­ant as well. Since you can’t read the notes par­tic­u­larly well in the table above, I’ve repro­duced them here:

NOTE 1 The choice of the MTTFD ranges of each chan­nel is based on fail­ure rates found in the field as state-​of-​the-​art, form­ing a kind of log­ar­ithmic scale fit­ting to the log­ar­ithmic PL scale. An MTTFD value of each chan­nel less than three years is not expec­ted to be found for real SRP/​CS since this would mean that after one year about 30 % of all sys­tems on the mar­ket will fail and will need to be replaced. An MTTFD value of each chan­nel great­er than 100 years is not accept­able because SRP/​CS for high risks should not depend on the reli­ab­il­ity of com­pon­ents alone. To rein­force the SRP/​CS against sys­tem­at­ic and ran­dom fail­ure, addi­tion­al means such as redund­ancy and test­ing should be required. To be prac­tic­able, the num­ber of ranges was restric­ted to three. The lim­it­a­tion of MTTFD of each chan­nel val­ues to a max­im­um of 100 years refers to the single chan­nel of the SRP/​CS which car­ries out the safety func­tion. Higher MTTFD val­ues can be used for single com­pon­ents (see Table D.1).

NOTE 2 The indic­ated bor­ders of this table are assumed with­in an accur­acy of 5%.

The stand­ard then tells us to select the MTTFD using a simple hier­archy [1, 4.5.2]:

For the estim­a­tion ofMTTFD of a com­pon­ent, the hier­arch­ic­al pro­ced­ure for find­ing data shall be, in the order giv­en:

a) use manufacturer’s data;
b) use meth­ods in Annex C and Annex D;
c) choose 10 years.

Why ten years? Ten years is half of the assumed mis­sion life­time of 20 years. More on mis­sion life­time in a later post.

Looking at [1, Annex C.2], you will find the “Good Engineering Practices” meth­od for estim­at­ing MTTFD, pre­sum­ing the man­u­fac­turer has not provided you with that inform­a­tion. ISO 13849 – 2 [2] has some ref­er­ence tables that provide some gen­er­al MTTFD val­ues for some kinds of com­pon­ents, but not every part that exists can be lis­ted. How can we deal with parts not lis­ted? [1, Annex C.4] provides us with a cal­cu­la­tion meth­od for estim­at­ing MTTFD for pneu­mat­ic, mech­an­ic­al and elec­tromech­an­ic­al com­pon­ents.

### Calculating MTTFD for pneumatic, mechanical and electromechanical components

I need to intro­duce you to a few more vari­ables before we look at how to cal­cu­late MTTFD for a com­pon­ent.

Variables
Variable Description
B10 Number of cycles until 10% of the com­pon­ents fail (for pneu­mat­ic and elec­tromech­an­ic­al com­pon­ents)
B10D Number of cycles until 10% of the com­pon­ents fail dan­ger­ously (for pneu­mat­ic and elec­tromech­an­ic­al com­pon­ents)
T life­time of the com­pon­ent
T10D the mean time until 10% of the com­pon­ents fail dan­ger­ously
hop is the mean oper­a­tion time, in hours per day;
dop is the mean oper­a­tion time, in days per year;
tcycle is the mean oper­a­tion time between the begin­ning of two suc­cess­ive cycles of the com­pon­ent. (e.g., switch­ing of a valve) in seconds per cycle.
s seconds
h hours
a years

Knowing a few details we can cal­cu­late the MTTFD using [1, Eqn C.1]. We need to know the fol­low­ing para­met­ers for the applic­a­tion:

• B10D
• hop
• dop
• tcycle

In order to use [1, Eqn. C.1], we need to first cal­cu­late nop, using [1, Eqn. C.2]:

We may also need one more cal­cu­la­tion, [1, Eqn. C.4]:

## Example Calculation [1, C.4.3]

For a pneu­mat­ic valve, a man­u­fac­turer determ­ines a mean value of 60 mil­lion cycles as B10D. The valve is used for two shifts each day on 220 oper­a­tion days a year. The mean time between the begin­ning of two suc­cess­ive switch­ing of the valve is estim­ated as 5 s. This yields the fol­low­ing val­ues:

• dop of 220 days per year;
• hop of 16 h per day;
• tcycle of 5 s per cycle;
• B10D of 60 mil­lion cycles.

Doing the math, we get:

So there you have it, at least for a fairly simple case. There are more examples in ISO 13849 – 1, and I would encour­age you to work through them. You can also find a wealth of examples in a report pro­duced by the BGIA in Germany, called the Functional safety of machine con­trols (BGIA Report 2/​2008e) [16]. The down­load for the report is linked from the ref­er­ence list at the end of this art­icle. If you are a SISTEMA user, there are lots of examples in the SISTEMA Cookbooks, and there are example files avail­able so that you can see how to assemble the sys­tems in the soft­ware.

The next part of this series cov­ers Diagnostic Coverage (DC), and the aver­age DC for mul­tiple safety func­tions in a sys­tem, DCavg.

In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[15]    “The bathtub curve and product fail­ure beha­vi­or part 1 of 2”, Findchart​.co, 2017. [Online]. Available: http://​find​chart​.co/​d​o​w​n​l​o​a​d​.​p​h​p​?​a​H​R​0​c​D​o​v​L​3​d​3​d​y​5​3​Z​W​l​i​d​W​x​s​L​m​N​v​b​S​9​o​b​3​R​3​a​X​J​l​L​2​l​z​c​3​V​l​M​j​E​v​a​H​Q​y​M​V​8​x​L​m​d​pZg. [Accessed: 03- Jan- 2017].

[16]   “Functional safety of machine con­trols – Application of EN ISO 13849 (BGIA Report 2/​2008e)”, dguv​.de, 2017. [Online]. Available: http://​www​.dguv​.de/​i​f​a​/​p​u​b​l​i​k​a​t​i​o​n​e​n​/​r​e​p​o​r​t​s​-​d​o​w​n​l​o​a​d​/​b​g​i​a​-​r​e​p​o​r​t​s​-​2​0​0​7​-​b​i​s​-​2​0​0​8​/​b​g​i​a​-​r​e​p​o​r​t-2 – 2008/index-2.jsp. [Accessed: 2017-​01-​04].

Acknowledgements: IEC, ISO and oth­ers as cited
Some Rights Reserved

## ISO 13849 – 1 Analysis — Part 5: Diagnostic Coverage (DC)

This entry is part 5 of 9 in the series How to do a 13849 – 1 ana­lys­is

# What is Diagnostic Coverage?

Understanding Diagnostic Coverage (DC) as it is used in ISO 13849 – 1 [1] is crit­ic­al to ana­lys­ing the design of any safety func­tion assessed using this stand­ard. In case you missed a pre­vi­ous part of the series, you can read it here.

In the last instal­ment of this series dis­cuss­ing MTTFD, I brought up the fact that everything fails even­tu­ally, and so everything has a nat­ur­al fail­ure rate. The bathtub curve shown at the top of this post shows a typ­ic­al fail­ure rate curve for most products. Failure rates tell you the aver­age time (or some­times the mean time) it takes for com­pon­ents or sys­tems to fail. Failure rates are expressed in many ways, MTTFD and PFHd being the ways rel­ev­ant to this dis­cus­sion of ISO 13849 ana­lys­is. MTTFis giv­en in years, and PFHd is giv­en in frac­tion­al hours (1/​h). As a remind­er, PFHd stands for “Probability of dan­ger­ous Failure per Hour”.

Three of the stand­ard archi­tec­tures include auto­mat­ic dia­gnost­ic func­tions, Categories 2, 3 and 4. As soon as we add dia­gnostics to the sys­tem, we need to know what faults the dia­gnostics can detect and how many of the dan­ger­ous fail­ures rel­at­ive to the total num­ber of fail­ures that rep­res­ents. Diagnostic Coverage (DC) rep­res­ents the ratio of dan­ger­ous fail­ures that can be detec­ted to the total dan­ger­ous fail­ures that could occur, expressed as a per­cent­age. There will be some fail­ures that do not res­ult in a dan­ger­ous fail­ure, and those fail­ures are excluded from DC because we don’t need to worry about them – if they occur, the sys­tem will not fail into a dan­ger­ous state.

Here’s the form­al defin­i­tion from [1]:

3.1.26 dia­gnost­ic cov­er­age (DC)

meas­ure of the effect­ive­ness of dia­gnostics, which may be determ­ined as the ratio between the fail­ure rate of detec­ted dan­ger­ous fail­ures and the fail­ure rate of total dan­ger­ous fail­ures

Note 1 to entry: Diagnostic cov­er­age can exist for the whole or parts of a safety-​related sys­tem. For example, dia­gnost­ic cov­er­age could exist for sensors and/​or logic sys­tem and/​or final ele­ments. [SOURCE: IEC 61508 – 4:1998, 3.8.6, mod­i­fied.]

That brings up two oth­er related defin­i­tions that need to be kept in mind [1]:

3.1.4 fail­ure

ter­min­a­tion of the abil­ity of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Failure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Failures which only affect the avail­ab­il­ity of the pro­cess under con­trol are out­side of the scope of this part of ISO 13849. [SOURCE: IEC 60050 – 191:1990, 04 – 01.]

and the most import­ant one [1]:

3.1.5 dan­ger­ous fail­ure

fail­ure which has the poten­tial to put the SRP/​CS in a haz­ard­ous or fail-​to-​function state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem; in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-​to- func­tion state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, mod­i­fied.]

Just as a remind­er, SRP/​CS stands for “safety-​related parts of con­trol sys­tems”.

## Failure Math

### Failure Rate Data Sources

To do any cal­cu­la­tions, we need data, and this is true for fail­ure rates as well. ISO 13849 – 1 provides some tables in the annexes that list some com­mon types of com­pon­ents and their asso­ci­ated fail­ure rates, and there are more fail­ure rate tables in ISO 13849 – 2. A word of cau­tion here: Do not mix sources of fail­ure rate data, as the con­di­tions under which that data is true won’t match the data in ISO 13849. There are a few good sources of fail­ure rate data out there, for example, MIL-​HDBK-​217, Reliability Prediction of Electronic Equipment [15], as well as the data­base main­tained by Exida. In any case, use a single source for your fail­ure rate data.

### Failure Rate Variables

IEC 61508 [7] defines a num­ber of vari­ables related to fail­ure rates. The lower­case Greek let­ter lambda, $\lambda$, is used to denote fail­ures.

The com­mon vari­able des­ig­na­tions used are:

$\lambda$ = fail­ures
$\lambda_{(t)}$= fail­ure rate
$\lambda_s$ = “safe” fail­ures
$\lambda_d$ = “dan­ger­ous” fail­ures
$\lambda_{dd}$ = detect­able “dan­ger­ous” fail­ures
$\lambda_{du}$ = undetect­able “dan­ger­ous” fail­ures

### Calculating DC

Of these vari­ables, we only need to con­cern ourselves with $\lambda_d$, $\lambda_{dd}$ and $\lambda_{du}$. To under­stand how these vari­ables are used, we can express their rela­tion­ship as

$\lambda_d=\lambda_{dd}+\lambda_{du}$

Following on that idea, the Diagnostic Coverage can be expressed as a per­cent­age like this:

$DC\%=\frac{\lambda_{dd}}{\lambda_d}\times 100$

## Determining DC%

If you want to actu­ally cal­cu­late DC%, you have some work ahead of you. Rather than going into the details here, I am going to refer you hard­core types to IEC 61508 – 2, Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems – Part 2: Requirements for electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems. This stand­ard goes into some depth on how to determ­ine fail­ure rates and how to cal­cu­late the “Safe Failure Fraction,” a num­ber which is related to DC but is not the same.

For every­one else, the good news is that you can use the table in Annex E to estim­ate the DC%. It’s worth not­ing here that Annex E is “Informative.” In standards-​speak, this means that the inform­a­tion in the annex is not part of the “norm­at­ive” text, which means that it is simply inform­a­tion to help you use the norm­at­ive part of the stand­ard. The design must con­form to the require­ments in the norm­at­ive text if you want to claim con­form­ity to the stand­ard. The fact that [1, Annex E] is inform­at­ive gives you the option to cal­cu­late the DC% value rather than select­ing it from Table E.1. Using the cal­cu­lated value would not viol­ate the require­ments in the norm­at­ive text.

If you are using IFA SISTEMA [16] to do the cal­cu­la­tions for you, you will find that the soft­ware lim­its you to select­ing a single DC meas­ure from Table E.1, and this prin­ciple applies if you are doing the cal­cu­la­tions by hand too. Only one item from Table E.1 can be selec­ted for a giv­en safety func­tion.

## Ranking DC

Once you have determ­ined the DC for a safety func­tion, you need to com­pare the DC value against [1, Table 5] to see if the DC is suf­fi­cient for the PLr you are try­ing to achieve. Table 5 bins the DC res­ults into four ranges. Just like bin­ning the PFHd val­ues into five ranges helps to pre­vent pre­ci­sion bias in estim­at­ing the prob­ab­il­ity of fail­ure of the com­plete sys­tem or safety func­tion, the ranges in Table 5 helps to pre­vent pre­ci­sion bias in the cal­cu­lated or selec­ted DC val­ues.

If the DC value was high enough for the PLr, then you are done with this part of the work. If not, you will need to go back to your design and add addi­tion­al dia­gnost­ic fea­tures so that you can either select a high­er cov­er­age from [1, Table E.1] or cal­cu­late a high­er value using [14].

## Multiple safety functions

When you have mul­tiple safety func­tions that make up a com­plete safety sys­tem, for example, an emer­gency stop func­tion and a guard inter­lock­ing func­tion, the DC val­ues need to be aver­aged to determ­ine the over­all DC for the com­plete sys­tem. [1, Annex E] provides you with a meth­od to do this in Equation E.1.

Plug in the val­ues for MTTFD and DC for each safety func­tion, and cal­cu­late the res­ult­ing DCavg value for the com­plete sys­tem.

That’s it for this art­icle. The next part will cov­er Common Cause Failures (CCF). Look for it on 20-​Mar-​17!

In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[16]     “IFA – Practical aids: Software-​Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv​.de, 2017. [Online]. Available: http://​www​.dguv​.de/​i​f​a​/​p​r​a​x​i​s​h​i​l​f​e​n​/​p​r​a​c​t​i​c​a​l​-​s​o​l​u​t​i​o​n​s​-​m​a​c​h​i​n​e​-​s​a​f​e​t​y​/​s​o​f​t​w​a​r​e​-​s​i​s​t​e​m​a​/​i​n​d​e​x​.​jsp. [Accessed: 30- Jan- 2017].