ISO 13849-1 Analysis — Part 4: MTTFD – Mean Time to Dangerous Failure

This entry is part 4 of 6 in the series How to do a 13849-1 analysis

Functional safety is all about the likelihood of a safety system failing to operate when you need it. Understanding Mean Time to Dangerous Failure, or MTTFD, is critical. If you have been reading about this topic at all, you may notice that I am abbreviating Mean Time to Dangerous Failure with all capital letters. Using MTTFD is a recent change that occurred in the third edition of ISO 13849-1, published in 2015. In the first and second editions, the correct abbreviation was MTTFd. Onward!

If you missed the third instalment in this series, you can read it here.

Defining MTTFD

Let’s start by having a look at some key definitions. Looking at [1, Cl. 3], you will find:

3.1.1 safety–related part of a control system (SRP/CS)—part of a control system that responds to safety-related input signals and generates safety-related
output signals

Note 1 to entry: The combined safety-related parts of a control system start at the point where the safety-related input signals are initiated (including, for example, the actuating cam and the roller of the position switch) and end at the output of the power control elements (including, for example, the main contacts of a contactor)

Note 2 to entry: If monitoring systems are used for diagnostics, they are also considered as SRP/CS.

3.1.5 dangerous failure—failure which has the potential to put the SRP/CS in a hazardous or fail-to-function state

Note 1 to entry: Whether or not the potential is realized can depend on the channel architecture of the system;
in redundant systems a dangerous hardware failure is less likely to lead to the overall dangerous or fail-tofunction
state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, modified.]

3.1.25 mean time to dangerous failure (MTTFD)—expectation of the mean time to dangerous failure

Definition 3.1.5 is pretty helpful, but definition 3.1.25 is, well, not much of a definition. Let’s look at this another way.

Failures and Faults

Since everything can and will eventually fail to perform the way we expect it to, we know that everything has a failure rate because everything takes some time to fail. Granted that this time may be very short, like the first time the unit is turned on, or it may be very long, sometimes hundreds of years. Remember that because this is a rate, it is something that occurs over time. It is also important to be clear that we are talking about failures and not faults. Reading from [1]:

3.1.3 fault—state of an item characterized by the inability to perform a required function, excluding the inability during preventive maintenance or other planned actions, or due to lack of external resources

Note 1 to entry: A fault is often the result of a failure of the item itself, but may exist without prior failure.

Note 2 to entry: In this part of ISO 13849, “fault” means random fault.
[SOURCE: IEC 60050?191:1990, 05-01.]

3.1.4 failure— termination of the ability of an item to perform a required function

Note 1 to entry: After a failure, the item has a fault.

Note 2 to entry: “Failure” is an event, as distinguished from “fault”, which is a state.

Note 3 to entry: The concept as defined does not apply to items consisting of software only.

Note 4 to entry: Failures which only affect the availability of the process under control are outside of the scope of this part of ISO 13849.
[SOURCE: IEC 60050–191:1990, 04-01.]

3.1.4 Note 2 is the important one at this point in the discussion.

Now, where we have multiples of something, like relays, valves, or safety systems, we now have a population of identical items, each of which will eventually fail at some point. We can count those failures as they occur and tally them up, and we can graph how many failures we get in the population over time. If this is starting to sound suspiciously like statistics to you, that is because it is.

OK, so let’s look at the kinds of failures that occur in that population. Some failures will result in a “safe” state, e.g., a relay failing with all poles open, and some will fail in a potentially “dangerous” state, like a normally closed valve developing a significant leak. If we tally up all the failures that occur, and then tally the number of “safe” failures and the number of “dangerous” failures in that population, we now have some very useful information.

The different kinds of failures are signified using the lowercase Greek letter \lambda (lambda). We can add some subscripts to help identify what kinds of failures we are talking about. The common variable designations used are [14]:

\lambda = failures
\lambda_{(t)} = failure rate
\lambda_s = “safe” failures
\lambda_d = “dangerous” failures
\lambda_{dd} = detectable “dangerous” failures
\lambda_{du} = undetectable “dangerous” failures

I will be discussing some of these variables in more detail in a later part of the series when I delve into Diagnostic Coverage, so don’t worry about them too much just yet.

Getting to MTTFD

Since we can now start to deal with the failure rate data mathematically, we can start to do some calculations about expected lifetime of a component or a system. That expected, or probable, lifetime is what definition 3.1.25 was on about, and is what we call MTTFD.

MTTFD is the time in years over which the probability of failure is relatively constant. If you look at a typical failure rate curve, called a “bathtub curve” due to its resemblance to the profile of a nice soaker tub, the MTTFD is the flatter portion of the curve between the end of the infant mortality period and the wear-out period at the end of life. This part of the curve is the portion assumed to be included in the “mission time” for the product. ISO 13849-1 assumes the mission time for all machinery is 20 years [1, 4.5.4] and [1, Cl. 10].

Diagram of a standardized bathtub-shaped failure rate curve.
Figure 1 – Typical Bathtub Curve [15]
ISO 13849-1 provides us with guidance on how MTTFD relates to the determination of the PL in [1, Cl. 4.5.2]. MTTFD is further grouped into three bands as shown in [1, Table 4].
Table showing the bands of Mean time to dangerous failure of each channel (MTTFD)

The notes for this table are important as well. Since you can’t read the notes particularly well in the table above, I’ve reproduced them here:

NOTE 1 The choice of the MTTFD ranges of each channel is based on failure rates found in the field as state-of-the-art, forming a kind of logarithmic scale fitting to the logarithmic PL scale. An MTTFD value of each channel less than three years is not expected to be found for real SRP/CS since this would mean that after one year about 30 % of all systems on the market will fail and will need to be replaced. An MTTFD value of each channel greater than 100 years is not acceptable because SRP/CS for high risks should not depend on the reliability of components alone. To reinforce the SRP/CS against systematic and random failure, additional means such as redundancy and testing should be required. To be practicable, the number of ranges was restricted to three. The limitation of MTTFD of each channel values to a maximum of 100 years refers to the single channel of the SRP/CS which carries out the safety function. Higher MTTFD values can be used for single components (see Table D.1).

NOTE 2 The indicated borders of this table are assumed within an accuracy of 5%.

The standard then tells us to select the MTTFD using a simple hierarchy [1, 4.5.2]:

For the estimation ofMTTFD of a component, the hierarchical procedure for finding data shall be, in the order given:

a) use manufacturer’s data;
b) use methods in Annex C and Annex D;
c) choose 10 years.

Why ten years? Ten years is half of the assumed mission lifetime of 20 years. More on mission lifetime in a later post.

Looking at [1, Annex C.2], you will find the “Good Engineering Practices” method for estimating MTTFD, presuming the manufacturer has not provided you with that information. ISO 13849-2 [2] has some reference tables that provide some general MTTFD values for some kinds of components, but not every part that exists can be listed. How can we deal with parts not listed? [1, Annex C.4] provides us with a calculation method for estimating MTTFD for pneumatic, mechanical and electromechanical components.

Calculating MTTFD for pneumatic, mechanical and electromechanical components

I need to introduce you to a few more variables before we look at how to calculate MTTFD for a component.

Variables
Variable Description
B10 Number of cycles until 10% of the components fail (for pneumatic and electromechanical components)
B10D Number of cycles until 10% of the components fail dangerously (for pneumatic and electromechanical components)
T lifetime of the component
T10D the mean time until 10% of the components fail dangerously
hop is the mean operation time, in hours per day;
dop is the mean operation time, in days per year;
tcycle is the mean operation time between the beginning of two successive cycles of the component. (e.g., switching of a valve) in seconds per cycle.
s seconds
h hours
a years

Knowing a few details we can calculate the MTTFD using [1, Eqn C.1]. We need to know the following parameters for the application:

  • B10D
  • hop
  • dop
  • tcycle

Formula for calculating MTTFD - ISO 13849-1, Equation C.1
Calculating MTTFD – [1, Eqn. C.1]
In order to use [1, Eqn. C.1], we need to first calculate nop, using [1, Eqn. C.2]:

Formula for calculating nop - ISO 13849-1, Equation C.2.
Calculating nop – [1, Eqn. C.2]
We may also need one more calculation, [1, Eqn. C.4]:

Calculating T10D using ISO 13849-1 Eqn. C.3
Calculating T10D – [1, Eqn. C.4]

Example Calculation [1, C.4.3]

For a pneumatic valve, a manufacturer determines a mean value of 60 million cycles as B10D. The valve is used for two shifts each day on 220 operation days a year. The mean time between the beginning of two successive switching of the valve is estimated as 5 s. This yields the following values:

  • dop of 220 days per year;
  • hop of 16 h per day;
  • tcycle of 5 s per cycle;
  • B10D of 60 million cycles.

Doing the math, we get:

Example C.4.3 calculations from, ISO 13849-1.
Example C.4.3

So there you have it, at least for a fairly simple case. There are more examples in ISO 13849-1, and I would encourage you to work through them. You can also find a wealth of examples in a report produced by the BGIA in Germany, called the Functional safety of machine controls (BGIA Report 2/2008e) [16]. The download for the report is linked from the reference list at the end of this article. If you are a SISTEMA user, there are lots of examples in the SISTEMA Cookbooks, and there are example files available so that you can see how to assemble the systems in the software.

The next part of this series covers Diagnostic Coverage (DC), and the average DC for multiple safety functions in a system, DCavg.

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[2]     Safety of machinery — Safety-related parts of control systems — Part 2: Validation. 2nd Edition. ISO Standard 13849-2. 2012.

[7]     Functional safety of electrical/electronic/programmable electronic safety-related systems. 7 parts. IEC Standard 61508. Second Edition. 2010.

[14]    Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 4: Definitions and abbreviations. IEC Standard 61508-4. Second Edition. 2010.

[15]    “The bathtub curve and product failure behavior part 1 of 2”, Findchart.co, 2017. [Online]. Available: http://findchart.co/download.php?aHR0cDovL3d3dy53ZWlidWxsLmNvbS9ob3R3aXJlL2lzc3VlMjEvaHQyMV8xLmdpZg. [Accessed: 03- Jan- 2017].

[16]   “Functional safety of machine controls – Application of EN ISO 13849 (BGIA Report 2/2008e)”, dguv.de, 2017. [Online]. Available: http://www.dguv.de/ifa/publikationen/reports-download/bgia-reports-2007-bis-2008/bgia-report-2-2008/index-2.jsp. [Accessed: 2017-01-04].

Digiprove sealCopyright secured by Digiprove © 2017
Acknowledgements: IEC, ISO and others as cited
Some Rights Reserved

ISO 13849 Analysis — Part 3: Architectural Category Selection

This entry is part 3 of 6 in the series How to do a 13849-1 analysis

At this point, you have completed the risk assessment, assigned required Performance Levels to each safety function, and developed the Safety Requirement Specification for each safety function. Next, you need to consider three aspects of the system design: Architectural Category, Channel Mean Time to Dangerous Failure (MTTFD), and Diagnostic Coverage (DCavg). In this part of the series, I am going to discuss selecting the architectural category for the system.

If you missed the second instalment in this series, you can read it here.

Understanding Performance Levels

To understand ISO 13849-1, it helps to know a little about where the standard originated. ISO 13849-1 is a simplified method for determining the reliability of safety-related controls for machinery. The basic ideas came from IEC 61508 [7], a seven-part standard originally published in 1998. IEC 61508 brought forward the concept of the Average Probability of Dangerous Failure per Hour, PFHD (1/h). Dangerous failures are those failures that result in non-performance of the safety function, and which cannot be detected by diagnostics. Here’s the formal definition from [1]:

3.1.5

dangerous failure
failure which has the potential to put the SRP/CS in a hazardous or fail-to-function state

Note 1 to entry: Whether or not the potential is realised can depend on the channel architecture of the system; in redundant systems a dangerous hardware failure is less likely to lead to the overall dangerous or fail-to-function state.

Note 2 to entry: [SOURCE: IEC 61508–4, 3.6.7, modified.]

The Performance Levels are simply bands of probabilities of Dangerous Failures, as shown in [1, Table 2] below.

Table 2 from ISO 13849-2:2015 showing the five Performance levels and the corresponding ranges of PFHd values.
Performance Levels as bands of PFHd ranges

The ranges shown in [1, Table 2] are approximate. If you need to see the specific limits of the bands for any reason, see [1, Annex K] describes the full span of PFHD, in table format.

There is another way to describe the same characteristics of a system, this one from IEC. Instead of using the PL system, IEC uses Safety Integrity Levels (SILs). [1, Table 3] shows the correspondence between PLs and SILs. Note that the correspondence is not exact. Where the calculated PFHd is close to either end of one of the PL or SIL bands, use the table in [1, Annex K] or in [9] to determine to which band(s) the performance should be assigned.

IEC produced a Technical Report [10] that provides guidance on how to use ISO 13849-1 or IEC 62061. The following table shows the relationship between PLs, PFHd and SILs.

Table showing the correspondence between the PL, PFHd, and SIL.
IEC/TR 62061-1:2010, Table 1

IEC 61508 includes SIL 4, which is not shown in [10, Table 1] because this level of performance exceeds the range of PFHD possible using ISO 13849-1 techniques. Also, you may have noticed that PLb and PLc are both within SIL1. This was done to accommodate the five architectural categories that came from EN 954-1 [12].

Why PL and not just PFHD? One of the odd things that humans do when we can calculate things is the development of what has been called “precision bias” [12]. Precision bias occurs when we can compute a number that appears very precise, e.g., 3.2 x 10-6, which then makes us feel like we have a very precise concept of the quantity. The problem, at least in this case, is that we are dealing with probabilities and minuscule probabilities at that. Using bands, like the PLs, forces us to “bin” these apparently precise numbers into larger groups, eliminating the effects of precision bias in the evaluation of the systems. Eliminating precision bias is the same reason that IEC 61508 uses SILs – binning the calculated values helps to reduce our tendency to develop a precision bias. The reality is that we just can’t predict the behaviour of these systems with as much precision as we would like to believe.

Getting to Performance Levels: MTTFD, Architectural Category and DC

Some aspects of the system design need to be considered to arrive at a Performance Level or make a prediction about failure rates in terms of PFHd.

First is the system architecture: Fundamentally, single channel or two channel. As a side note, if your system uses more than two channels there are ways to handle this in ISO 13849-1 that are workarounds, or you can use IEC 62061 or IEC 61508, either of which will handle these more complex systems more easily. Remember, ISO 13849-1 is intended for relatively simple systems.

When we get into the analysis in a later article, we will be calculating or estimating the Mean Time to Dangerous Failure, MTTFD, of each channel, and then of the entire system. MTTFD is expressed in years, unlike PFHd, which is expressed in fractional hours (1/h). I have yet to hear why this is the case as it seems rather confusing. However, that is current practice.

Architectural Categories

Once the required PL is known, the next step is the selection of the architectural category. The basic architectural categories were introduced initially in EN 954-1:1996 [12].  The Categories were carried forward unchanged into the first edition of ISO 13849-1 in 1999. The Categories were maintained and expanded to include additional requirements in the second and third editions in 2005 and 2015.

Since I have explored the details of the architectures in a previous series, I am not going to repeat that here. Instead, I will refer you to that series. The architectural Categories come in five flavours:

Architecture Basics
Category Structure Basic Requirements Safety Princple
For full requirements, see [1, Cl. 6]
B Single channel Basic circuit conditions are met (i.e., components are rated for the circuit voltage and current, etc.) Use of components that are designed and built to the relevant component standards. [1, 6.2.3] Component selection
1 Single channel Category B plus the use of “well-tried components” and “well-tried safety principles” [1, 6.2.4] Component selection
2 Single channel Category B plus the use of “well-tried safety principles” and periodic testing [1, 4.5.4] of the safety function by the machine control system. [1, 6.2.5] System Structure
3 Dual channel Category B plus the use of “well-tried safety principles” and no single fault shall lead to the loss of the safety function.

Where practicable, single faults shall be detected. [1, 6.2.6]

System Structure
4 Dual channel Category B plus the use of “well-tried safety principles” and no single fault shall lead to the loss of the safety function.

Single faults are detected at or before the next demand on the safety system, but where this is not possible an accumulation of undetected faults will not lead to the loss of the safety function. [1, 6.2.7]

System Structure

[1, Table 10] provides a more detailed summary of the requirements than the summary table above provides.

Since the Categories cannot all achieve the same reliability, the PL and the Categories are linked as shown in [1, Fig. 5]. This diagram summarises te relationship of the three central parameters in ISO 13849-1 in one illustration.

Figure relating Architectural Category, DC avg, MTTFD and PL.
Relationship between categories, DCavg, MTTFD of each channel and PL

Starting with the PLr from the Safety Requirement Specification for the first safety function, you can use Fig. 5 to help you select the Category and other parameters necessary for the design. For example, suppose that the risk assessment indicates that an emergency stop system is needed. ISO 13850 requires that emergency stop functions provide a minimum of PLc, so using this as the basis you can look at the vertical axis in the diagram to find PLc, and then read across the figure. You will see that PLc can be achieved using Category 1, 2, or 3 architecture, each with corresponding differences in MTTFD and DCavg. For example:

  • Cat. 1, MTTFD = high and DCavg = none, or
  • Cat. 2, MTTFD = Medium to High and DCavg = Low to Medium, or
  • Cat. 3, MTTFD = Low to High and DCavg = Low to Medium.

As you can see, the MTTFD in the channels decreases as the diagnostic coverage increases. The design compensates for lower reliability in the components by increasing the diagnostic coverage and adding redundancy. Using [1, Fig. 5] you can pin down any of the parameters and then select the others as appropriate.

One additional point regarding Category 3 and 4: The difference between these Categories is increased Diagnostic Coverage. While Category 3 is Single Fault Tolerant, Category 4 has additional diagnostic capabilities so that additional faults cannot lead to the loss of the safety function. This is not the same as being multiple fault tolerant, as the system is still designed to operate in the presence of only a single fault, it is simply enhanced diagnostic capability.

It is worth noting that ISO 13849 only recognises structures with single or dual channel configurations. If you need to develop a system with more than single redundancy (i.e., more than two channels), you can analyse each pair of channels as a dual channel architecture, or you can move to using IEC 62061 or IEC 61508, either of which permits any level of redundancy.

The next step in this process is the evaluation of the component and channel MTTFD, and then the determination of the complete system MTTFD. Part 4 of this series publishes on 13-Feb-17.

In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. ISO Standard 13849-1. 2015.

[7]     Functional safety of electrical/electronic/programmable electronic safety-related systems. IEC Standard 61508. 2nd Edition. Seven Parts. 2010.

[9]      Safety of machinery – Functional safety of safety-related electrical, electronic and programmable electronic control systems. IEC Standard 62061. 2005.

[10]    Guidance on the application of ISO 13849-1 and IEC 62061 in the design of safety-related control systems for machinery. IEC Technical Report 62061-1. 2010.

[11]    D. S. G. Nix, Y. Chinniah, F. Dosio, M. Fessler, F. Eng, and F. Schrever, “Linking Risk and Reliability—Mapping the output of risk assessment tools to functional safety requirements for safety related control systems,” 2015.

[12]    Safety of machinery. Safety related parts of control systems. General principles for design. CEN Standard EN 954-1. 1996.

ISO 13849 Analysis — Part 2: Safety Requirement Specification

This entry is part 2 of 6 in the series How to do a 13849-1 analysis

Developing the Safety Requirement Specification

The Safety Requirement Specification sounds pretty heavy, but actually, it is just a big name for a way to organise the information you need to have to analyse and design the safety systems for your machinery. Note that I am assuming that you are doing this in the “right” order, meaning that you are planning the design beforehand, rather than trying to back-fill the documentation after completing the design. In either case, the process is the same, but getting the information you need can be much harder after the fact, than before the doing the design work. Doing some aspects in a review mode is impossible, especially if a third party to whom you have no access did the design work [8].

If you missed the first instalment in this series, you can read it here.

What goes into a Safety Requirements Specification?

For reference, chapter 5 of ISO 13849-1 [1] covers safety requirement specifications to some degree, but it needs some clarification I think. First of all, what is a safety function?

Safety functions include any function of the machine that has a direct protective effect for the worker using the machinery. However, using this definition, it is possible to ignore some important functions. Complementary protective measures, like emergency stop, can be missed because they are usually “after the fact”, i.e., the injury occurs, and then the E-stop is pressed, so you cannot say that it has a “direct protective effect”. If we look at the definitions in [1], we find:

3.1.20

safety function

function of the machine whose failure can result in an immediate increase of the risk(s)
[SOURCE: ISO 12100:2010, 3.30.]

Linking Risk to Functional Safety

Referring to the risk assessment, any risk control that protects workers from some aspect of the machine operation using a control function like an interlocked gate, or by maintaining a temperature below a critical level or speed at a safe level, is a safety function. For example: if the temperature in a process rises too high, the process will explode; or if a shaft speed is too high (or too low) the tool may shatter and eject broken pieces at high speed. Therefore, the temperature control function and the speed control function are safety functions. These functions may also be process control functions, but the potential for an immediate increase in risk due to a failure is what makes these functions safety functions no matter what else they may do.

[1, Table 8] gives you some examples of various kinds of safety functions found on machines. The table is not inclusive – meaning there are many more safety functions out there than are listed in the table. Your job is to figure out which ones live in your machine. It is a bit like Pokemon – ya gotta catch ’em all!

Basic Safety Requirement Specification

Each safety function must have a Performance Level or a Safety Integrity Level assigned as part of the risk assessment. For each safety function, you need to develop the following information:

Basic Safety Requirement Specification
Item Description
Safety Function Identification Name or other references, e.g. “Access Gate Interlock” or “Hazard Zone 2.”
Functional Characteristics
  • Intended use or foreseeable misuse of the machine relevant to the safety function
  • Operating modes relevant to the safety function
  • Cycle time of the machine
  • Response time of the safety function
Emergency Operation Is this an emergency operation function? If yes, what types of emergencies might be mitigated by this function?
Interactions What operating modes require this function to be operational? Are there modes where this function requires deliberate bypass? These could include normal working modes (automatic, manual, set-up, changeover), and fault-finding or maintenance modes.
Behaviour How you want the system to behave when the safety function is triggered, i.e., Power is immediately removed from the MIG welder using an IEC 60204-1 Category 0 stop function, and robot motions are stopped using IEC 60204-1 Category 1 stop function through the robot safety stop input.

or

All horizontal pneumatic motions stop in their current positions. Vertical motions return to the raised or retracted positions.

Also to be considered is a power loss condition. Should the system behave in the same way as if the safety function was triggered, not react at all, or do something else? Consider vertical axes that might require holding brakes or other mechanisms to prevent power loss causing unexpected motion.

Machine State after triggering What is the expected state of the machine after triggering the safety function? What is the recovery process?
Frequency of Operation How often do you expect this safety function to be used? A reasonable estimate is needed. More on this below.
Priority of Operation If simultaneous triggering of multiple safety functions is possible, which function(s) takes precedence? E.g., Emergency Stop always takes precedence over everything else. What happens if you have a safe speed function and a guard interlock that are associated because the interlock is part of a guarding function covering a shaft, and you need to troubleshoot the safe speed function, so you need access to the shaft where the encoders are mounted?
Required Performance Level I suggest recording the S, F, and P values selected as well as the PLr value selected for later reference.

Here’s an example table in MS Word format that you can use as a starting point for your SRS documents. Note that SRS can be much more detailed than this. If you want more information on this, read IEC 61508-1, 7.10.2.

So, that is the minimum. You can add lots more information to the minimum requirements, but this will get you started. If you want more information on developing the SRS, you will need to get a copy of IEC 61508 [7].

What’s Next?

Next, you need to be able to make some design decisions about system architecture and components. Circuit architectures have been discussed at some length on the MS101 blog in the past, so I am not going to go through them again in this series. Instead, I will show you how to choose an architecture based on your design goals in the next instalment. In case you missed the first part of the series, you can read it here.

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[7]     Functional safety of electrical/electronic/programmable electronic safety-related systems. Seven parts. IEC Standard 61508. Edition 2. 2010.

[8]     S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncertainties in the validation of an existing safety-related control circuit with the ISO 13849-1:2006 design standard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.