## ISO 13849 – 1 Analysis — Part 4: MTTFD – Mean Time to Dangerous Failure

This entry is part 4 of 9 in the series How to do a 13849 – 1 ana­lys­is

Functional safety is all about the like­li­hood of a safety sys­tem fail­ing to oper­ate when you need it. Understanding Mean Time to Dangerous Failure, or MTTFD, is crit­ic­al. If you have been read­ing about this top­ic at all, you may notice that I am abbre­vi­at­ing Mean Time to Dangerous Failure with all cap­it­al let­ters. Using MTTFD is a recent change that occurred in the third edi­tion of ISO 13849 – 1, pub­lished in 2015. In the first and second edi­tions, the cor­rect abbre­vi­ation was MTTFd. Onward!

If you missed the third instal­ment in this series, you can read it here.

## Defining MTTFD

Let’s start by hav­ing a look at some key defin­i­tions. Looking at [1, Cl. 3], you will find:

3.1.1 safety – related part of a con­trol sys­tem (SRP/​CS)—part of a con­trol sys­tem that responds to safety-​related input sig­nals and gen­er­ates safety-​related
out­put sig­nals

Note 1 to entry: The com­bined safety-​related parts of a con­trol sys­tem start at the point where the safety-​related input sig­nals are ini­ti­ated (includ­ing, for example, the actu­at­ing cam and the roller of the pos­i­tion switch) and end at the out­put of the power con­trol ele­ments (includ­ing, for example, the main con­tacts of a con­tact­or)

Note 2 to entry: If mon­it­or­ing sys­tems are used for dia­gnostics, they are also con­sidered as SRP/​CS.

3.1.5 dan­ger­ous fail­ure—fail­ure which has the poten­tial to put the SRP/​CS in a haz­ard­ous or fail-​to-​function state

Note 1 to entry: Whether or not the poten­tial is real­ized can depend on the chan­nel archi­tec­ture of the sys­tem;
in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-​tofunction
state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, mod­i­fied.]

3.1.25 mean time to dan­ger­ous fail­ure (MTTFD)—expect­a­tion of the mean time to dan­ger­ous fail­ure

Definition 3.1.5 is pretty help­ful, but defin­i­tion 3.1.25 is, well, not much of a defin­i­tion. Let’s look at this anoth­er way.

## Failures and Faults

Since everything can and will even­tu­ally fail to per­form the way we expect it to, we know that everything has a fail­ure rate because everything takes some time to fail. Granted that this time may be very short, like the first time the unit is turned on, or it may be very long, some­times hun­dreds of years. Remember that because this is a rate, it is some­thing that occurs over time. It is also import­ant to be clear that we are talk­ing about fail­ures and not faults. Reading from [1]:

3.1.3 fault—state of an item char­ac­ter­ized by the inab­il­ity to per­form a required func­tion, exclud­ing the inab­il­ity dur­ing pre­vent­ive main­ten­ance or oth­er planned actions, or due to lack of extern­al resources

Note 1 to entry: A fault is often the res­ult of a fail­ure of the item itself, but may exist without pri­or fail­ure.

Note 2 to entry: In this part of ISO 13849, “fault” means ran­dom fault.
[SOURCE: IEC 60050?191:1990, 05 – 01.]

3.1.4 fail­ure— ter­min­a­tion of the abil­ity of an item to per­form a required func­tion

Note 1 to entry: After a fail­ure, the item has a fault.

Note 2 to entry: “Failure” is an event, as dis­tin­guished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items con­sist­ing of soft­ware only.

Note 4 to entry: Failures which only affect the avail­ab­il­ity of the pro­cess under con­trol are out­side of the scope of this part of ISO 13849.
[SOURCE: IEC 60050 – 191:1990, 04 – 01.]

3.1.4 Note 2 is the import­ant one at this point in the dis­cus­sion.

Now, where we have mul­tiples of some­thing, like relays, valves, or safety sys­tems, we now have a pop­u­la­tion of identic­al items, each of which will even­tu­ally fail at some point. We can count those fail­ures as they occur and tally them up, and we can graph how many fail­ures we get in the pop­u­la­tion over time. If this is start­ing to sound sus­pi­ciously like stat­ist­ics to you, that is because it is.

OK, so let’s look at the kinds of fail­ures that occur in that pop­u­la­tion. Some fail­ures will res­ult in a “safe” state, e.g., a relay fail­ing with all poles open, and some will fail in a poten­tially “dan­ger­ous” state, like a nor­mally closed valve devel­op­ing a sig­ni­fic­ant leak. If we tally up all the fail­ures that occur, and then tally the num­ber of “safe” fail­ures and the num­ber of “dan­ger­ous” fail­ures in that pop­u­la­tion, we now have some very use­ful inform­a­tion.

The dif­fer­ent kinds of fail­ures are sig­ni­fied using the lower­case Greek let­ter $\lambda$ (lambda). We can add some sub­scripts to help identi­fy what kinds of fail­ures we are talk­ing about. The com­mon vari­able des­ig­na­tions used are [14]:

$\lambda$ = fail­ures
$\lambda_{(t)}$= fail­ure rate
$\lambda_s$ = “safe” fail­ures
$\lambda_d$ = “dan­ger­ous” fail­ures
$\lambda_{dd}$ = detect­able “dan­ger­ous” fail­ures
$\lambda_{du}$ = undetect­able “dan­ger­ous” fail­ures

I will be dis­cuss­ing some of these vari­ables in more detail in a later part of the series when I delve into Diagnostic Coverage, so don’t worry about them too much just yet.

## Getting to MTTFD

Since we can now start to deal with the fail­ure rate data math­em­at­ic­ally, we can start to do some cal­cu­la­tions about expec­ted life­time of a com­pon­ent or a sys­tem. That expec­ted, or prob­able, life­time is what defin­i­tion 3.1.25 was on about, and is what we call MTTFD.

MTTFD is the time in years over which the prob­ab­il­ity of fail­ure is rel­at­ively con­stant. If you look at a typ­ic­al fail­ure rate curve, called a “bathtub curve” due to its resemb­lance to the pro­file of a nice soak­er tub, the MTTFD is the flat­ter por­tion of the curve between the end of the infant mor­tal­ity peri­od and the wear-​out peri­od at the end of life. This part of the curve is the por­tion assumed to be included in the “mis­sion time” for the product. ISO 13849 – 1 assumes the mis­sion time for all machinery is 20 years [1, 4.5.4] and [1, Cl. 10].

ISO 13849 – 1 provides us with guid­ance on how MTTFD relates to the determ­in­a­tion of the PL in [1, Cl. 4.5.2]. MTTFD is fur­ther grouped into three bands as shown in [1, Table 4].

The notes for this table are import­ant as well. Since you can’t read the notes par­tic­u­larly well in the table above, I’ve repro­duced them here:

NOTE 1 The choice of the MTTFD ranges of each chan­nel is based on fail­ure rates found in the field as state-​of-​the-​art, form­ing a kind of log­ar­ithmic scale fit­ting to the log­ar­ithmic PL scale. An MTTFD value of each chan­nel less than three years is not expec­ted to be found for real SRP/​CS since this would mean that after one year about 30 % of all sys­tems on the mar­ket will fail and will need to be replaced. An MTTFD value of each chan­nel great­er than 100 years is not accept­able because SRP/​CS for high risks should not depend on the reli­ab­il­ity of com­pon­ents alone. To rein­force the SRP/​CS against sys­tem­at­ic and ran­dom fail­ure, addi­tion­al means such as redund­ancy and test­ing should be required. To be prac­tic­able, the num­ber of ranges was restric­ted to three. The lim­it­a­tion of MTTFD of each chan­nel val­ues to a max­im­um of 100 years refers to the single chan­nel of the SRP/​CS which car­ries out the safety func­tion. Higher MTTFD val­ues can be used for single com­pon­ents (see Table D.1).

NOTE 2 The indic­ated bor­ders of this table are assumed with­in an accur­acy of 5%.

The stand­ard then tells us to select the MTTFD using a simple hier­archy [1, 4.5.2]:

For the estim­a­tion ofMTTFD of a com­pon­ent, the hier­arch­ic­al pro­ced­ure for find­ing data shall be, in the order giv­en:

a) use manufacturer’s data;
b) use meth­ods in Annex C and Annex D;
c) choose 10 years.

Why ten years? Ten years is half of the assumed mis­sion life­time of 20 years. More on mis­sion life­time in a later post.

Looking at [1, Annex C.2], you will find the “Good Engineering Practices” meth­od for estim­at­ing MTTFD, pre­sum­ing the man­u­fac­turer has not provided you with that inform­a­tion. ISO 13849 – 2 [2] has some ref­er­ence tables that provide some gen­er­al MTTFD val­ues for some kinds of com­pon­ents, but not every part that exists can be lis­ted. How can we deal with parts not lis­ted? [1, Annex C.4] provides us with a cal­cu­la­tion meth­od for estim­at­ing MTTFD for pneu­mat­ic, mech­an­ic­al and elec­tromech­an­ic­al com­pon­ents.

### Calculating MTTFD for pneumatic, mechanical and electromechanical components

I need to intro­duce you to a few more vari­ables before we look at how to cal­cu­late MTTFD for a com­pon­ent.

Variables
Variable Description
B10 Number of cycles until 10% of the com­pon­ents fail (for pneu­mat­ic and elec­tromech­an­ic­al com­pon­ents)
B10D Number of cycles until 10% of the com­pon­ents fail dan­ger­ously (for pneu­mat­ic and elec­tromech­an­ic­al com­pon­ents)
T life­time of the com­pon­ent
T10D the mean time until 10% of the com­pon­ents fail dan­ger­ously
hop is the mean oper­a­tion time, in hours per day;
dop is the mean oper­a­tion time, in days per year;
tcycle is the mean oper­a­tion time between the begin­ning of two suc­cess­ive cycles of the com­pon­ent. (e.g., switch­ing of a valve) in seconds per cycle.
s seconds
h hours
a years

Knowing a few details we can cal­cu­late the MTTFD using [1, Eqn C.1]. We need to know the fol­low­ing para­met­ers for the applic­a­tion:

• B10D
• hop
• dop
• tcycle

In order to use [1, Eqn. C.1], we need to first cal­cu­late nop, using [1, Eqn. C.2]:

We may also need one more cal­cu­la­tion, [1, Eqn. C.4]:

## Example Calculation [1, C.4.3]

For a pneu­mat­ic valve, a man­u­fac­turer determ­ines a mean value of 60 mil­lion cycles as B10D. The valve is used for two shifts each day on 220 oper­a­tion days a year. The mean time between the begin­ning of two suc­cess­ive switch­ing of the valve is estim­ated as 5 s. This yields the fol­low­ing val­ues:

• dop of 220 days per year;
• hop of 16 h per day;
• tcycle of 5 s per cycle;
• B10D of 60 mil­lion cycles.

Doing the math, we get:

So there you have it, at least for a fairly simple case. There are more examples in ISO 13849 – 1, and I would encour­age you to work through them. You can also find a wealth of examples in a report pro­duced by the BGIA in Germany, called the Functional safety of machine con­trols (BGIA Report 2/​2008e) [16]. The down­load for the report is linked from the ref­er­ence list at the end of this art­icle. If you are a SISTEMA user, there are lots of examples in the SISTEMA Cookbooks, and there are example files avail­able so that you can see how to assemble the sys­tems in the soft­ware.

The next part of this series cov­ers Diagnostic Coverage (DC), and the aver­age DC for mul­tiple safety func­tions in a sys­tem, DCavg.

In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[15]    “The bathtub curve and product fail­ure beha­vi­or part 1 of 2”, Findchart​.co, 2017. [Online]. Available: http://​find​chart​.co/​d​o​w​n​l​o​a​d​.​p​h​p​?​a​H​R​0​c​D​o​v​L​3​d​3​d​y​5​3​Z​W​l​i​d​W​x​s​L​m​N​v​b​S​9​o​b​3​R​3​a​X​J​l​L​2​l​z​c​3​V​l​M​j​E​v​a​H​Q​y​M​V​8​x​L​m​d​pZg. [Accessed: 03- Jan- 2017].

[16]   “Functional safety of machine con­trols – Application of EN ISO 13849 (BGIA Report 2/​2008e)”, dguv​.de, 2017. [Online]. Available: http://​www​.dguv​.de/​i​f​a​/​p​u​b​l​i​k​a​t​i​o​n​e​n​/​r​e​p​o​r​t​s​-​d​o​w​n​l​o​a​d​/​b​g​i​a​-​r​e​p​o​r​t​s​-​2​0​0​7​-​b​i​s​-​2​0​0​8​/​b​g​i​a​-​r​e​p​o​r​t-2 – 2008/index-2.jsp. [Accessed: 2017-​01-​04].

Acknowledgements: IEC, ISO and oth­ers as cited
Some Rights Reserved

## ISO 13849 – 1 Analysis — Part 3: Architectural Category Selection

This entry is part 3 of 9 in the series How to do a 13849 – 1 ana­lys­is

At this point, you have com­pleted the risk assess­ment, assigned required Performance Levels to each safety func­tion, and developed the Safety Requirement Specification for each safety func­tion. Next, you need to con­sider three aspects of the sys­tem design: Architectural Category, Channel Mean Time to Dangerous Failure (MTTFD), and Diagnostic Coverage (DCavg). In this part of the series, I am going to dis­cuss select­ing the archi­tec­tur­al cat­egory for the sys­tem.

If you missed the second instal­ment in this series, you can read it here.

## Understanding Performance Levels

To under­stand ISO 13849 – 1, it helps to know a little about where the stand­ard ori­gin­ated. ISO 13849 – 1 is a sim­pli­fied meth­od for determ­in­ing the reli­ab­il­ity of safety-​related con­trols for machinery. The basic ideas came from IEC 61508 [7], a seven-​part stand­ard ori­gin­ally pub­lished in 1998. IEC 61508 brought for­ward the concept of the Average Probability of Dangerous Failure per Hour, PFHD (1/​h). Dangerous fail­ures are those fail­ures that res­ult in non-​performance of the safety func­tion, and which can­not be detec­ted by dia­gnostics. Here’s the form­al defin­i­tion from [1]:

3.1.5

dan­ger­ous fail­ure
fail­ure which has the poten­tial to put the SRP/​CS in a haz­ard­ous or fail-​to-​function state

Note 1 to entry: Whether or not the poten­tial is real­ised can depend on the chan­nel archi­tec­ture of the sys­tem; in redund­ant sys­tems a dan­ger­ous hard­ware fail­ure is less likely to lead to the over­all dan­ger­ous or fail-​to-​function state.

Note 2 to entry: [SOURCE: IEC 61508 – 4, 3.6.7, mod­i­fied.]

The Performance Levels are simply bands of prob­ab­il­it­ies of Dangerous Failures, as shown in [1, Table 2] below.

The ranges shown in [1, Table 2] are approx­im­ate. If you need to see the spe­cif­ic lim­its of the bands for any reas­on, see [1, Annex K] describes the full span of PFHD, in table format.

There is anoth­er way to describe the same char­ac­ter­ist­ics of a sys­tem, this one from IEC. Instead of using the PL sys­tem, IEC uses Safety Integrity Levels (SILs). [1, Table 3] shows the cor­res­pond­ence between PLs and SILs. Note that the cor­res­pond­ence is not exact. Where the cal­cu­lated PFHd is close to either end of one of the PL or SIL bands, use the table in [1, Annex K] or in [9] to determ­ine to which band(s) the per­form­ance should be assigned.

IEC pro­duced a Technical Report [10] that provides guid­ance on how to use ISO 13849 – 1 or IEC 62061. The fol­low­ing table shows the rela­tion­ship between PLs, PFHd and SILs.

IEC 61508 includes SIL 4, which is not shown in [10, Table 1] because this level of per­form­ance exceeds the range of PFHD pos­sible using ISO 13849 – 1 tech­niques. Also, you may have noticed that PLb and PLc are both with­in SIL1. This was done to accom­mod­ate the five archi­tec­tur­al cat­egor­ies that came from EN 954 – 1 [12].

Why PL and not just PFHD? One of the odd things that humans do when we can cal­cu­late things is the devel­op­ment of what has been called “pre­ci­sion bias” [12]. Precision bias occurs when we can com­pute a num­ber that appears very pre­cise, e.g., 3.2 x 10-6, which then makes us feel like we have a very pre­cise concept of the quant­ity. The prob­lem, at least in this case, is that we are deal­ing with prob­ab­il­it­ies and minus­cule prob­ab­il­it­ies at that. Using bands, like the PLs, forces us to “bin” these appar­ently pre­cise num­bers into lar­ger groups, elim­in­at­ing the effects of pre­ci­sion bias in the eval­u­ation of the sys­tems. Eliminating pre­ci­sion bias is the same reas­on that IEC 61508 uses SILs – bin­ning the cal­cu­lated val­ues helps to reduce our tend­ency to devel­op a pre­ci­sion bias. The real­ity is that we just can’t pre­dict the beha­viour of these sys­tems with as much pre­ci­sion as we would like to believe.

## Getting to Performance Levels: MTTFD, Architectural Category and DC

Some aspects of the sys­tem design need to be con­sidered to arrive at a Performance Level or make a pre­dic­tion about fail­ure rates in terms of PFHd.

First is the sys­tem archi­tec­ture: Fundamentally, single chan­nel or two chan­nel. As a side note, if your sys­tem uses more than two chan­nels there are ways to handle this in ISO 13849 – 1 that are work­arounds, or you can use IEC 62061 or IEC 61508, either of which will handle these more com­plex sys­tems more eas­ily. Remember, ISO 13849 – 1 is inten­ded for rel­at­ively simple sys­tems.

When we get into the ana­lys­is in a later art­icle, we will be cal­cu­lat­ing or estim­at­ing the Mean Time to Dangerous Failure, MTTFD, of each chan­nel, and then of the entire sys­tem. MTTFD is expressed in years, unlike PFHd, which is expressed in frac­tion­al hours (1/​h). I have yet to hear why this is the case as it seems rather con­fus­ing. However, that is cur­rent prac­tice.

### Architectural Categories

Once the required PL is known, the next step is the selec­tion of the archi­tec­tur­al cat­egory. The basic archi­tec­tur­al cat­egor­ies were intro­duced ini­tially in EN 954 – 1:1996 [12].  The Categories were car­ried for­ward unchanged into the first edi­tion of ISO 13849 – 1 in 1999. The Categories were main­tained and expan­ded to include addi­tion­al require­ments in the second and third edi­tions in 2005 and 2015.

Since I have explored the details of the archi­tec­tures in a pre­vi­ous series, I am not going to repeat that here. Instead, I will refer you to that series. The archi­tec­tur­al Categories come in five fla­vours:

Architecture Basics
Category Structure Basic Requirements Safety Princple
For full require­ments, see [1, Cl. 6]
B Single chan­nel Basic cir­cuit con­di­tions are met (i.e., com­pon­ents are rated for the cir­cuit voltage and cur­rent, etc.) Use of com­pon­ents that are designed and built to the rel­ev­ant com­pon­ent stand­ards. [1, 6.2.3] Component selec­tion
1 Single chan­nel Category B plus the use of “well-​tried com­pon­ents” and “well-​tried safety prin­ciples” [1, 6.2.4] Component selec­tion
2 Single chan­nel Category B plus the use of “well-​tried safety prin­ciples” and peri­od­ic test­ing [1, 4.5.4] of the safety func­tion by the machine con­trol sys­tem. [1, 6.2.5] System Structure
3 Dual chan­nel Category B plus the use of “well-​tried safety prin­ciples” and no single fault shall lead to the loss of the safety func­tion.

Where prac­tic­able, single faults shall be detec­ted. [1, 6.2.6]

System Structure
4 Dual chan­nel Category B plus the use of “well-​tried safety prin­ciples” and no single fault shall lead to the loss of the safety func­tion.

Single faults are detec­ted at or before the next demand on the safety sys­tem, but where this is not pos­sible an accu­mu­la­tion of undetec­ted faults will not lead to the loss of the safety func­tion. [1, 6.2.7]

System Structure

[1, Table 10] provides a more detailed sum­mary of the require­ments than the sum­mary table above provides.

Since the Categories can­not all achieve the same reli­ab­il­ity, the PL and the Categories are linked as shown in [1, Fig. 5]. This dia­gram sum­mar­ises te rela­tion­ship of the three cent­ral para­met­ers in ISO 13849 – 1 in one illus­tra­tion.

Starting with the PLr from the Safety Requirement Specification for the first safety func­tion, you can use Fig. 5 to help you select the Category and oth­er para­met­ers neces­sary for the design. For example, sup­pose that the risk assess­ment indic­ates that an emer­gency stop sys­tem is needed. ISO 13850 requires that emer­gency stop func­tions provide a min­im­um of PLc, so using this as the basis you can look at the ver­tic­al axis in the dia­gram to find PLc, and then read across the fig­ure. You will see that PLc can be achieved using Category 1, 2, or 3 archi­tec­ture, each with cor­res­pond­ing dif­fer­ences in MTTFD and DCavg. For example:

• Cat. 1, MTTFD = high and DCavg = none, or
• Cat. 2, MTTFD = Medium to High and DCavg = Low to Medium, or
• Cat. 3, MTTFD = Low to High and DCavg = Low to Medium.

As you can see, the MTTFD in the chan­nels decreases as the dia­gnost­ic cov­er­age increases. The design com­pensates for lower reli­ab­il­ity in the com­pon­ents by increas­ing the dia­gnost­ic cov­er­age and adding redund­ancy. Using [1, Fig. 5] you can pin down any of the para­met­ers and then select the oth­ers as appro­pri­ate.

One addi­tion­al point regard­ing Category 3 and 4: The dif­fer­ence between these Categories is increased Diagnostic Coverage. While Category 3 is Single Fault Tolerant, Category 4 has addi­tion­al dia­gnost­ic cap­ab­il­it­ies so that addi­tion­al faults can­not lead to the loss of the safety func­tion. This is not the same as being mul­tiple fault tol­er­ant, as the sys­tem is still designed to oper­ate in the pres­ence of only a single fault, it is simply enhanced dia­gnost­ic cap­ab­il­ity.

It is worth not­ing that ISO 13849 only recog­nises struc­tures with single or dual chan­nel con­fig­ur­a­tions. If you need to devel­op a sys­tem with more than single redund­ancy (i.e., more than two chan­nels), you can ana­lyse each pair of chan­nels as a dual chan­nel archi­tec­ture, or you can move to using IEC 62061 or IEC 61508, either of which per­mits any level of redund­ancy.

The next step in this pro­cess is the eval­u­ation of the com­pon­ent and chan­nel MTTFD, and then the determ­in­a­tion of the com­plete sys­tem MTTFD. Part 4 of this series pub­lishes on 13-​Feb-​17.

In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.

[1]     Safety of machinery — Safety-​related parts of con­trol sys­tems — Part 1: General prin­ciples for design. ISO Standard 13849 – 1. 2015.

[7]     Functional safety of electrical/​electronic/​programmable elec­tron­ic safety-​related sys­tems. IEC Standard 61508. 2nd Edition. Seven Parts. 2010.

Acknowledgements: IEC and ISO as cited.
Some Rights Reserved

## ISO 13849 – 1 Analysis — Part 2: Safety Requirement Specification

This entry is part 2 of 9 in the series How to do a 13849 – 1 ana­lys­is

## Developing the Safety Requirement Specification

The Safety Requirement Specification sounds pretty heavy, but actu­ally, it is just a big name for a way to organ­ise the inform­a­tion you need to have to ana­lyse and design the safety sys­tems for your machinery. Note that I am assum­ing that you are doing this in the “right” order, mean­ing that you are plan­ning the design before­hand, rather than try­ing to back-​fill the doc­u­ment­a­tion after com­plet­ing the design. In either case, the pro­cess is the same, but get­ting the inform­a­tion you need can be much harder after the fact, than before the doing the design work. Doing some aspects in a review mode is impossible, espe­cially if a third party to whom you have no access did the design work [8].

If you missed the first instal­ment in this series, you can read it here.

## What goes into a Safety Requirements Specification?

For ref­er­ence, chapter 5 of ISO 13849 – 1 [1] cov­ers safety require­ment spe­cific­a­tions to some degree, but it needs some cla­ri­fic­a­tion I think. First of all, what is a safety func­tion?

Safety func­tions include any func­tion of the machine that has a dir­ect pro­tect­ive effect for the work­er using the machinery. However, using this defin­i­tion, it is pos­sible to ignore some import­ant func­tions. Complementary pro­tect­ive meas­ures, like emer­gency stop, can be missed because they are usu­ally “after the fact”, i.e., the injury occurs, and then the E-​stop is pressed, so you can­not say that it has a “dir­ect pro­tect­ive effect”. If we look at the defin­i­tions in [1], we find:

3.1.20

safety func­tion

func­tion of the machine whose fail­ure can res­ult in an imme­di­ate increase of the risk(s)
[SOURCE: ISO 12100:2010, 3.30.]

## Linking Risk to Functional Safety

Referring to the risk assess­ment, any risk con­trol that pro­tects work­ers from some aspect of the machine oper­a­tion using a con­trol func­tion like an inter­locked gate, or by main­tain­ing a tem­per­at­ure below a crit­ic­al level or speed at a safe level, is a safety func­tion. For example: if the tem­per­at­ure in a pro­cess rises too high, the pro­cess will explode; or if a shaft speed is too high (or too low) the tool may shat­ter and eject broken pieces at high speed. Therefore, the tem­per­at­ure con­trol func­tion and the speed con­trol func­tion are safety func­tions. These func­tions may also be pro­cess con­trol func­tions, but the poten­tial for an imme­di­ate increase in risk due to a fail­ure is what makes these func­tions safety func­tions no mat­ter what else they may do.

[1, Table 8] gives you some examples of vari­ous kinds of safety func­tions found on machines. The table is not inclus­ive – mean­ing there are many more safety func­tions out there than are lis­ted in the table. Your job is to fig­ure out which ones live in your machine. It is a bit like Pokemon – ya gotta catch ‘em all!

## Basic Safety Requirement Specification

Each safety func­tion must have a Performance Level or a Safety Integrity Level assigned as part of the risk assess­ment. For each safety func­tion, you need to devel­op the fol­low­ing inform­a­tion:

Basic Safety Requirement Specification
Item Description
Safety Function Identification Name or oth­er ref­er­ences, e.g. “Access Gate Interlock” or “Hazard Zone 2.”
Functional Characteristics
• Intended use or fore­see­able mis­use of the machine rel­ev­ant to the safety func­tion
• Operating modes rel­ev­ant to the safety func­tion
• Cycle time of the machine
• Response time of the safety func­tion
Emergency Operation Is this an emer­gency oper­a­tion func­tion? If yes, what types of emer­gen­cies might be mit­ig­ated by this func­tion?
Interactions What oper­at­ing modes require this func­tion to be oper­a­tion­al? Are there modes where this func­tion requires delib­er­ate bypass? These could include nor­mal work­ing modes (auto­mat­ic, manu­al, set-​up, changeover), and fault-​finding or main­ten­ance modes.
Behaviour How you want the sys­tem to behave when the safety func­tion is triggered, i.e., Power is imme­di­ately removed from the MIG weld­er using an IEC 60204 – 1 Category 0 stop func­tion, and robot motions are stopped using IEC 60204 – 1 Category 1 stop func­tion through the robot safety stop input.

or

All hori­zont­al pneu­mat­ic motions stop in their cur­rent pos­i­tions. Vertical motions return to the raised or retrac­ted pos­i­tions.

Also to be con­sidered is a power loss con­di­tion. Should the sys­tem behave in the same way as if the safety func­tion was triggered, not react at all, or do some­thing else? Consider ver­tic­al axes that might require hold­ing brakes or oth­er mech­an­isms to pre­vent power loss caus­ing unex­pec­ted motion.

Machine State after trig­ger­ing What is the expec­ted state of the machine after trig­ger­ing the safety func­tion? What is the recov­ery pro­cess?
Frequency of Operation How often do you expect this safety func­tion to be used? A reas­on­able estim­ate is needed. More on this below.
Priority of Operation If sim­ul­tan­eous trig­ger­ing of mul­tiple safety func­tions is pos­sible, which function(s) takes pre­ced­ence? E.g., Emergency Stop always takes pre­ced­ence over everything else. What hap­pens if you have a safe speed func­tion and a guard inter­lock that are asso­ci­ated because the inter­lock is part of a guard­ing func­tion cov­er­ing a shaft, and you need to troubleshoot the safe speed func­tion, so you need access to the shaft where the encoders are moun­ted?
Required Performance Level I sug­gest record­ing the S, F, and P val­ues selec­ted as well as the PLr value selec­ted for later ref­er­ence.

Here’s an example table in MS Word format that you can use as a start­ing point for your SRS doc­u­ments. Note that SRS can be much more detailed than this. If you want more inform­a­tion on this, read IEC 61508 – 1, 7.10.2.

So, that is the min­im­um. You can add lots more inform­a­tion to the min­im­um require­ments, but this will get you star­ted. If you want more inform­a­tion on devel­op­ing the SRS, you will need to get a copy of IEC 61508 [7].

## What’s Next?

Next, you need to be able to make some design decisions about sys­tem archi­tec­ture and com­pon­ents. Circuit archi­tec­tures have been dis­cussed at some length on the MS101 blog in the past, so I am not going to go through them again in this series. Instead, I will show you how to choose an archi­tec­ture based on your design goals in the next instal­ment. In case you missed the first part of the series, you can read it here.

## Book List

Here are some books that I think you may find help­ful on this jour­ney:

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

## References

Note: This ref­er­ence list starts in Part 1 of the series, so “miss­ing” ref­er­ences may show in oth­er parts of the series. Included in the last post of the series is the com­plete ref­er­ence list.