- Interlock Architectures – Pt. 1: What do those categories really mean?
- Interlock Architectures – Pt. 2: Category 1
- Interlock Architectures – Pt. 3: Category 2
- Interlock Architectures – Pt. 4: Category 3 – Control Reliable
- Interlock Architectures – Pt. 5: Category 4 – Control Reliable
- Interlock Architectures Pt. 6 – Comparing North American and International Systems
- ISO 13849 – 1:2006”>Inconsistencies in ISO 13849 – 1:2006
- YOU ready?”>31-Dec-2011 – Are YOU ready?
The post has been updated since it was first written in 2010.
If you are new to functional safety, new to design of control systems for machinery, or both, this post and the subsequent posts covering the five architectural categories provided in ISO 13849 – 1. These categories are similar to those in EN 954 – 1:1996 but have been expanded to include some additional criteria. This post explores the categories to give you an introduction to the concepts used in ISO 13849 – 1.
Note that when this post was first written, ISO 13849 – 1:2006 was current. Since then, a new edition was published in 2015, and yet another is expected to be published by May-2021. The definitions discussed in this post are still valid.
What do those categories really mean?
The architectures used as the basis of interlock design and analysis have a long history. Two basic forms existed in the early days: the ANSI categories and the CSA variant, and the CEN forms.
The ANSI/CSA architectures were called SIMPLE, SINGLE CHANNEL, SINGLE CHANNEL-MONITORED, and CONTROL RELIABLE. The basic system arose in the ANSI/RIA R15.06 1992 standard and was used until 2014. The CSA variant used the same names as the ANSI version but made a small differentiation in the CONTROL RELIABLE category. This differentiation was very subtle and was often completely misunderstood by readers. This system was introduced in Canada in CSA Z434-1994 and was discontinued in 2016. This system of safety-related control system architecture categories is no longer used in any jurisdiction.
And then there was EN 954 – 1
In 1996 CEN published an important standard for machine builders – EN 954 – 1, “Safety of Machinery – Safety Related Parts of Control Systems – Part 1: General Principles for Design” . This standard set the stage for defining control reliability in machinery safeguarding systems, introducing the Reliability categories that have become ubiquitous. So what do these categories mean, and how are they applied under the latest machinery functional safety standard, ISO 13849 – 1 ?
The categories are used to describe system architectures for safety-related control systems. Each architecture carries with it a range of reliable performance that can be related to the degree of risk reduction you are expecting to achieve with the system. These architectures can be applied equally to electrical, electronic, pneumatic, hydraulic or mechanical control systems.
Early electrical ‘master-control-relay’ circuits used a simple architecture with a single contactor, or sometimes two, and a single channel style of architecture to maintain the contactor coil circuit once the START or POWER ON button (PB2 in Fig. 1) had been pressed. Power to the output elements of the machine controls was supplied via contacts on the contactor, which is why it was called the Master Control Relay or ‘MCR’. The POWER OFF button (PB1 in Fig. 1) could be labelled that way, or you could make the same circuit into an Emergency Stop by simply replacing the operator with a red mushroom-head push button. These devices were usually spring-return, so to restore power, all that was needed was to push the POWER ON button again (Fig.1).
Typically, the components used in these circuits were specified to meet the circuit conditions, but not more. Control manufacturers brought out over-dimensioned versions, such as Allen-Bradley’s Bulletin 700-PK contactor which had 20 A rated contacts instead of the standard Bulletin 700’s 10 A contacts.
When interlocked guards began to show up, they were integrated into the original MCR circuit by adding a basic control relay (CR1 in Fig. 2) whose coil was controlled by the interlock switch(es) (LS1 in Fig. 2), and whose output contacts were in series with the coil circuit of the MCR contactor. Opening the guard interlock would open the MCR coil circuit and drop power to the machine controls. Very simple.
Ice-cube’ style plug-in relays were often chosen for CR1. These devices did not have ‘force-guided’ contacts in them, so it was possible to have one contact in the relay fail while the other continued to operate properly.
LS1 could be any kind of switch. Frequently a ‘micro-switch’ style of limit switch was chosen. These snap-action switches could fail shorted internally, or weld closed and the actuator would continue to work normally even though the switch itself had failed. These switches are also ridiculously easy to bypass. All that is required is a piece of tape or an elastic band and the switch is no longer doing its job.
The problem with these circuits is that they can fail in a number of ways that aren’t obvious to the user, with the result being that the interlock might not work as expected, or the Emergency Stop might fail just when you need it most.
These original circuits are the basis for what became known as ‘Category B’ (‘B’ for ‘Basic’) circuits. Here’s the definition from the standard. Note that I am taking this excerpt from ISO 13849 – 1: 2007 (Edition 2). “SRP/CS” stands for “Safety Related Parts of Control Systems”:
6.2.3 Category B
The SRP/CS shall, as a minimum, be designed, constructed, selected, assembled and combined in accordance with the relevant standards and using basic safety principles for the specific application to withstand
- the expected operating stresses, e.g. the reliability with respect to breaking capacity and frequency,
- the influence of the processed material, e.g. detergents in a washing machine, and
- other relevant external influences, e.g. mechanical vibration, electromagnetic interference, power supply interruptions or disturbances.
There is no diagnostic coverage (DCavg = none) within category B systems and the MTTFd of each channel can be low to medium. In such structures (normally single-channel systems), the consideration of CCF is not relevant.
The maximum PL achievable with category B is PL = b.
NOTE When a fault occurs it can lead to the loss of the safety function.
Specific requirements for electromagnetic compatibility are found in the relevant product standards, e.g. IEC 61800 – 3 for power drive systems. For functional safety of SRP/CS in particular, the immunity requirements are relevant. If no product standard exists, at least the immunity requirements of IEC 61000 – 6‑2 should be followed. 
The standard  also provides us with a nice logic block diagram of what a single-channel system might look like:
If you look at this block diagram and the Start/Stop Circuit with Guard Relay above, you can see how this basic circuit translates into a single channel architecture, since from the control inputs to the controlled load you have a single channel. Even the guard loop is a single channel. A failure in any component in the channel can result in loss of control of the load.
Lets look at each part of this requirement in more detail, since each of the subsequent Categories builds upon these BASIC requirements.
The SRP/CS shall, as a minimum, be designed, constructed, selected, assembled and combined in accordance with the relevant standards and using basic safety principles for the specific application…
Basic Safety Principles
We have to go to ISO 13849 – 2 to get a definition of what Basic Safety Principles might include. Looking at Annex A.2 of the standard we find:
|Basic Safety Principles||Remarks|
|Use of suitable materials and adequate manufacturing||Selection of material, manufacturing methods and treatment in relation to, e. g., stress, durability, elasticity, friction, wear, corrosion, temperature.|
|Correct dimensioning and shaping||Consider e. g. stress, strain, fatigue, surface roughness, tolerances, sticking, manufacturing.|
|Proper selection, combination, arrangements, assembly and installation of components / systems.||Apply manufacturer’s application notes, e. g. catalogue sheets, installation instructions, specifications, and use of good engineering practice in similar components/systems.|
|Use of de – energisation principle||The safe state is obtained by release of energy. See primary action for stopping in EN 292 – 2:1991 (ISO/TR 12100 – 2:1992), 3.7.1. Energy is supplied for starting the movement of a mechanism. See primary action for starting in EN 292 – 2:1991 (ISO/TR 12100 – 2:1992), 3.7.1. Consider different modes, e. g. operation mode, maintenance mode. |
This principle shall not be used in special applications, e. g. to keep energy for clamping devices.
|Proper fastening||For the application of screw locking consider manufacturer’s application notes. Overloading can be avoided by applying adequate torque loading technology.|
|Limitation of the generation and/or transmission of force and similar parameters||Examples are break pin, break plate, torque limiting clutch.|
|Limitation of range of environmental parameters||Examples of parameters are temperature, humidity, pollution at the installation place. See clause 8 and consider manufacturer’s application notes.|
|Limitation of speed and similar parameters||Consider e. g., the speed, acceleration, deceleration required by the application|
|Proper reaction time||Consider e. g. spring tiredness, friction, lubrication, temperature, inertia during acceleration and deceleration, combination of tolerances.|
|Protection against unexpected start – up||Consider unexpected start-up caused by stored energy and after power “supply” restoration for different modes as operation mode, maintenance mode etc.|
Special equipment for release of stored energy may be necessary.
Special applications, e. g., to keep energy for clamping devices or ensure a position, need to be considered separately.
|Simplification||Reduce the number of components in the safety-related system.|
|Separation||Separation of safety-related functions from other functions.|
Proper prevention of the ingress of fluids and dust
|Consider IP rating [see EN 60529 (IEC 60529)]|
As you can see, the basic safety principles are pretty basic – select components appropriately for the application, consider the operating conditions for the components, follow manufacturer’s data, and use de-energization to create the stop function. That way, a loss of power results in the system failing into a safe state, as does an open relay coil or set of burnt contacts.
“…the expected operating stresses, e.g. the reliability with respect to breaking capacity and frequency,”
Specify your components correctly with regard to voltage, current, breaking capacity, temperature, humidity, dust,…
“…other relevant external influences, e.g. mechanical vibration, electromagnetic interference, power supply interruptions or disturbances.”
“Specific requirements for electromagnetic compatibility are found in the relevant product standards, e.g. IEC 61800 – 3 for power drive systems. For functional safety of SRP/CS in particular, the immunity requirements are relevant. If no product standard exists, at least the immunity requirements of IEC 61000 – 6‑2 should be followed.”
Probably the biggest ‘gotcha’ in this point is “electromagnetic interference”. This is important enough that the standard devotes a paragraph to it specifically. I added the bold text to highlight the idea of ‘functional safety’. You can find other information in other posts on this blog on that topic. If your product is destined for the European Union (EU), then you will almost certainly be doing some EMC testing, unless your product is a ‘fixed installation’. If it’s going to almost any other market, you probably are not undertaking this testing. So how do you know if your design meets this criteria? Unless you test, you don’t. You can make some educated guesses based on using sound engineering practices , but after that you can only hope.
“…There is no diagnostic coverage (DCavg = none) within category B systems…”
Category B systems are fundamentally single-channel. A single fault in the system will lead to the loss of the safety function. This sentence refers to the concept of “diagnostic coverage” that was introduced in ISO 13849 – 1:2007, but what this means in practice is that there is no monitoring or feedback from any critical elements. Remember our basic MCR circuit? If the MCR contactor welded closed, the only diagnostic was the failure of the machine to stop when the emergency stop button was pressed.
Component Failure Rates
“…the MTTFd of each channel can be low to medium.”
This part of the statement is referring to another new concept from ISO 13849 – 1:2007, “MTTFd”. Standing for “Mean Time to Failure Dangerous”, this concept looks at the expected failure rates of the component in hours. Calculating MTTFd is a significant part of implementing the new standard. From the perspective of understanding Category B, what this means is that you do not need to use high-reliability components in these systems.
Common Cause Failures
“In such structures (normally single-channel systems), the consideration of CCF is not relevant.”
CCF is another new concept from ISO 13849 – 1:2007, and stands for “Common Cause Failure”. I’m not going to get into this in any detail here, but suffice to say that design techniques, as well as channel separation (impossible in a single channel architecture) and other techniques are used to reduce the likelihood of CCF in higher reliability systems.
Performance Levels – PL
“The maximum PL achievable with category B is PL = b.”
PL stands for “Performance Level.” FIve Performance Levels have been defined from ‘a’ to ‘e’. The Performance Levels represent bands or groups of failure rates expressed as the fractional probability of failure per hour.
For example, PLa, the band with the highest probability of failure per hour, includes an average probability of dangerous failure per hour of >= 10-5 to < 10-4 failures per hour. The fractional failure rate is referred to as the Probability of Dangerous Failure per Hour (PFHd). To convert PFHd to something a bit easier to understand, you can convert PFHd to years-to-failure using the following calculations. I’m going to assume that the control system is operating 24/7/365, but by adjusting the number of hours in the year for other operating periods you can adjust the result. See below.
Now that we know how many failures per year we’re dealing with, we need to convert to the number of years to failure.
What this means is that the probability of experiencing failure in a PLa system can reach 100% in as little as 1.142 years. We can convert years-to-failure to hours-to-failure by multiplying the years by 8760.
Let’s calculate the other limit for the PLa band.
Since we moved by one factor of magnitude smaller (10-4 to 10-5), it makes sense that the failure rate got smaller by that same amount. Calculating the years-to failure we get:
PLb is equal to >= 3 × 10-6 to < 10-5 failures per hour. Calculating the lower limit we get:
The upper limit of the PLb band is the same as the lower limit of the PLa band, so I won’t calculate that again.
While 38 years to failure sounds like a lot, it’s important to bear in mind that that is simply the point in time when the probability of failure hits 100%. You can have a failure occur the first time you use the safety function, or not have it fail until 38 years from the first time the function is used. Some machines may run considerably longer than that before a failure occurs. To get an idea about why that can happen, have a look at the bathtub curve and what it means for product life. When dealing with the probability of a safety function failing, these numbers represent some pretty high failure rates.
If you consider an operation running a single shift in Canada where the normal working year is 50 weeks and the normal workday is 7.5 hours, a working year is
Taking the failure rates per hour above, yields:
PLa = one failure in 5.3 years of operation to one failure in 53.3 years of operation
PLb = one failure in 53.3 years of operation to one failure in 177.8 years of operation.
If we go to an operation running three shifts in Canada, a working year is:
Taking the failure rates per hour above and recalculating, this yields:
PLa = one failure in 1.8 years of operation to one failure in 17.8 years of operation
PLb = one failure in 17.8 years of operation to one failure in 59.25 years of operation
Except for the least hazardous machines, I can’t imagine too many employers that would be happy with a safety function on a machine that failed within two years from new!
Now you should be starting to get an idea about where this is going. It’s important to remember that probabilities are just that – the failure could happen in the first hour of operation or at any time after that, or never. These figures give you some way to gauge the relative reliability of the design and ARE NOT any sort of guarantee.
Watch for the next post in this series where I will look at Category 1 requirements!
 Safety of Machinery – Safety Related Parts of Control Systems – Part 1: General Principles for Design. CEN Standard EN 954 – 1. 1996.
 Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. ISO Standard 13849 – 1. 2006.
 Safety of machinery — Safety-related parts of control systems — Part 2: Validation, ISO Standard 13849 – 2. 2003.
 Safety of machinery — Safety-related parts of control systems — Part 100: Guidelines for the use and application of ISO 13849 – 1. ISO Technical Report TR 100. 2000.
 Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. CEN Standard EN ISO 13849 – 1. 2008.