Interlock Architectures – Pt. 1: What do those categories really mean?

This entry is part 1 of 8 in the series Circuit Architectures Explored

In 1995 CEN published an important standard for machine builders – EN 954-1, Safety of Machinery – Safety Related Parts of Control Systems – Part 1: General Principles for Design. This standard set the stage for defining control reliability in machinery safeguarding systems, introducing the Reliability categories that have become ubiquitous. So what do these categories mean, and how are they applied under the latest machinery standard, ISO 13849-1?

This entry is part 1 of 8 in the series Circuit Architectures Explored

It all started with EN 954-1

In 1996 CEN published an important standard for machine builders – EN 954-1, “Safety of Machinery – Safety Related Parts of Control Systems – Part 1: General Principles for Design” [1]. This standard set the stage for defining control reliability in machinery safeguarding systems, introducing the Reliability categories that have become ubiquitous. So what do these categories mean, and how are they applied under the latest machinery functional safety standard, ISO 13849-1 [2]?

Download ISO Standards

Circuit Categories

The categories are used to describe system architectures for safety related control systems. Each architecture carries with it a range of reliability performance that can be related to the degree of risk reduction you are expecting to achieve with the system. These architectures can be applied equally to electrical, electronic, pneumatic, hydraulic or mechanical control systems.

Historical Circuits

Early electrical ‘master-control-relay’ circuits used a simple architecture with a single contactor, or sometimes two, and a single channel style of architecture to maintain the contactor coil circuit once the START or POWER ON button (PB2 in Fig. 1) had been pressed. Power to the output elements of the machine controls was supplied via contacts on the contactor, which is why it was called the Master Control Relay or ‘MCR’. The POWER OFF button (PB1 in Fig. 1) could be labeled that way, or you could make the same circuit into an Emergency Stop by simply replacing the operator with a red mushroom-head push button. These devices were usually spring-return, so to restore power, all that was needed was to push the POWER ON button again (Fig.1).

Basic Stop/Start Circuit
Figure 1 – Basic Stop/Start Circuit

Typically, the components used in these circuits were specified to meet the circuit conditions, but not more. Controls manufacturers brought out over-dimensioned versions, such as Allen-Bradley’s Bulletin 700-PK contactor which had 20 A rated contacts instead of the standard Bulletin 700’s 10 A contacts.

When interlocked guards began to show up, they were integrated into the original MCR circuit by adding a basic control relay (CR1 in Fig. 2) whose coil was controlled by the interlock switch(es) (LS1 in Fig. 2), and whose output contacts were in series with the coil circuit of the MCR contactor. Opening the guard interlock would open the MCR coil circuit and drop power to the machine controls. Very simple.

Start/Stop Circuit with Guard Relay
Figure 2 – Old-School Start/Stop Circuit with Guard Relay

‘Ice-cube’ style plug-in relays were often chosen for CR1. These devices did not have ‘force-guided’ contacts in them, so it was possible to have one contact in the relay fail while the other continued to operate properly.

LS1 could be any kind of switch. Frequently a ‘micro-switch’ style of limit switch was chosen. These snap-action switches could fail shorted internally, or weld closed and the actuator would continue to work normally even though the switch itself had failed. These switches are also ridiculously easy to bypass. All that is required is a piece of tape or an elastic band and the switch is no longer doing it’s job.

Micro-Switch style limit switch used as an interlock switch
Photo 1 – Micro-Switch style limit switch used as a cover interlock switch in a piece of industrial laundry equipment

The problem with these circuits is that they can fail in a number of ways that aren’t obvious to the user, with the result being that the interlock might not work as expected, or the Emergency Stop might fail just when you need it most.

Modern Circuits

Category B

These original circuits are the basis for what became known as ‘Category B’ (‘B’ for ‘Basic’) circuits. Here’s the definition from the standard. Note that I am taking this excerpt from ISO 13849-1: 2007 (Edition 2). “SRP/CS” stands for “Safety Related Parts of Control Systems”:

6.2.3 Category B
The SRP/CS shall, as a minimum, be designed, constructed, selected, assembled and combined in accordance with the relevant standards and using basic safety principles for the specific application to withstand

  • the expected operating stresses, e.g. the reliability with respect to breaking capacity and frequency,
  • the influence of the processed material, e.g. detergents in a washing machine, and
  • other relevant external influences, e.g. mechanical vibration, electromagnetic interference, power supply interruptions or disturbances.

There is no diagnostic coverage (DCavg = none) within category B systems and the MTTFd of each channel can be low to medium. In such structures (normally single-channel systems), the consideration of CCF is not relevant.

The maximum PL achievable with category B is PL = b.

NOTE When a fault occurs it can lead to the loss of the safety function.

Specific requirements for electromagnetic compatibility are found in the relevant product standards, e.g. IEC 61800-3 for power drive systems. For functional safety of SRP/CS in particular, the immunity requirements are relevant. If no product standard exists, at least the immunity requirements of IEC 61000-6-2 should be followed.

The standard also provides us with a nice block diagram of what a single-channel system might look like:

Category B Designated Architecture
ISO 13849-1 Category B Designated Architecture

If you look at this block diagram and the Start/Stop Circuit with Guard Relay above, you can see how this basic circuit translates into a single channel architecture, since from the control inputs to the controlled load you have a single channel. Even the guard loop is a single channel. A failure in any component in the channel can result in loss of control of the load.

Lets look at each part of this requirement in more detail, since each of the subsequent Categories builds upon these BASIC requirements.

The SRP/CS shall, as a minimum, be designed, constructed, selected, assembled and combined in accordance with the relevant standards and using basic safety principles for the specific application…

Basic Safety Principles

We have to go to ISO 13849-2 to get a definition of what Basic Safety Principles might include. Looking at Annex A.2 of the standard we find:

Table A.1 — Basic Safety Principles

Basic Safety Principles Remarks
Use of suitable materials and adequate manufacturing Selection of material, manufacturing methods and treatment in relation to, e. g. stress, durability, elasticity, friction, wear,
corrosion, temperature.
Correct dimensioning and shaping Consider e. g. stress, strain, fatigue, surface roughness, tolerances, sticking, manufacturing.
Proper selection, combination, arrangements, assembly and installation of components/systems. Apply manufacturer’s application notes, e. g. catalogue sheets, installation instructions, specifications, and use of good engineering practice in similar components/systems.
Use of de–energisation principle The safe state is obtained by release of energy. See primary action for stopping in EN 292–2:1991 (ISO/TR 12100-2:1992), 3.7.1. Energy is supplied for starting the movement of a mechanism. See primary action for starting in EN 292–2:1991 (ISO/TR 12100-2:1992), 3.7.1.Consider different modes, e. g. operation mode, maintenance mode.

This principle shall not be used in special applications, e. g. to keep energy for clamping devices.

Proper fastening For the application of screw locking consider manufacturer’s application notes.Overloading can be avoided by applying adequate torque loading technology.
Limitation of the generation and/or transmission of force and similar parameters Examples are break pin, break plate, torque limiting clutch.
Limitation of range of environmental parameters Examples of parameters are temperature, humidity, pollution at the installation place. See clause 8 and consider
manufacturer’s application notes.
Limitation of speed and similar parameters Consider e. g. the speed, acceleration, deceleration required by the application
Proper reaction time Consider e. g. spring tiredness, friction, lubrication, temperature, inertia during acceleration and deceleration,
combination of tolerances.
Protection against unexpected start–up Consider unexpected start-up caused by stored energy and after power “supply” restoration for different modes as
operation mode, maintenance mode etc.
Special equipment for release of stored energy may be necessary.
Special applications, e. g. to keep energy for clamping devices or ensure a position, need to be considered
separately.
Simplification Reduce the number of components in the safety-related system.
Separation Separation of safety-related functions from other functions.
Proper lubrication
Proper prevention of the ingress of fluids and dust Consider IP rating [see EN 60529 (IEC 60529)]

Download ISO Standards
As you can see, the basic safety principles are pretty basic – select components appropriately for the application, consider the operating conditions for the components, follow manufacturer’s data, and use de-energization to create the stop function. That way, a loss of power results in the system failing into a safe state, as does an open relay coil or set of burnt contacts.

“…the expected operating stresses, e.g. the reliability with respect to breaking capacity and frequency,”

Specify your components correctly with regard to voltage, current, breaking capacity, temperature, humidity, dust,…

“…other relevant external influences, e.g. mechanical vibration, electromagnetic interference, power supply interruptions or disturbances.”

“Specific requirements for electromagnetic compatibility are found in the relevant product standards, e.g. IEC 61800-3 for power drive systems. For functional safety of SRP/CS in particular, the immunity requirements are relevant. If no product standard exists, at least the immunity requirements of IEC 61000-6-2 should be followed.”

Probably the biggest ‘gotcha’ in this point is “electromagnetic interference”. This is important enough that the standard devotes a paragraph to it specifically. I added the bold text to highlight the idea of ‘functional safety’. You can find other information in other posts on this blog on that topic. If your product is destined for the European Union (EU), then you will almost certainly be doing some EMC testing, unless your product is a ‘fixed installation’. If it’s going to almost any other market, you probably are not undertaking this testing. So how do you know if your design meets this criteria? Unless you test, you don’t. You can make some educated guesses based on using sound engineering practices , but after that you can only hope.

Diagnostic Coverage

“…There is no diagnostic coverage (DCavg = none) within category B systems…”

Category B systems are fundamentally single channel. A single fault in the system will lead to the loss of the safety function. This sentence refers to the concept of “diagnostic coverage” that was introduced in ISO 13849-1:2007, but what this means in practice is that there is no monitoring or feedback from any critical elements. Remember our basic MCR circuit? If the MCR contactor welded closed, the only diagnostic was the failure of the machine to stop when the emergency stop button was pressed.

Component Failure Rates

“…the MTTFd of each channel can be low to medium.”

This part of the statement is referring to another new concept from ISO 13849-1:2007, “MTTFd“. Standing for “Mean Time to Failure Dangerous”, this concept looks at the expected failure rates of the component in hours. Calculating MTTFd is a significant part of implementing the new standard. From the perspective of understanding Category B, what this means is that you do not need to use high-reliability components in these systems.

Common Cause Failures

“In such structures (normally single-channel systems), the consideration of CCF is not relevant.”

CCF is another new concept from ISO 13849-1:2007, and stands for “Common Cause Failure”. I’m not going to get into this in any detail here, but suffice to say that design techniques, as well as channel separation (impossible in a single channel architecture) and other techniques are used to reduce the likelihood of CCF in higher reliability systems.

Performance Levels

“The maximum PL achievable with category B is PL = b.”

PL stands for “Performance Level”, divided into five degrees from ‘a’ to ‘e’. PLa is equal to an average probability of dangerous failure per hour of >= 10-5 to < 10-4 failures per hour. PLb is equal to >= 3 × 10-6 to < 10-5 failures per hour or once in 10,000 to 100,000 hours, to once in 3,000,000 hours of operation. This sounds like a lot, but when dealing with probabilities, these numbers are actually pretty low.

If you consider an operation running a single shift in Canada where the normal working year is 50 weeks and the normal workday is 7.5 hours, a working year is

7.5 h/d x 5 d/w x 50 w/a = 1875 hours/a

Taking the failure rates per hour above, yields:

PLa = one failure in 5.3 years of operation to one failure in 53.3 years

PLb = one failure in 1600 years of operation

If we go to an operation running three shifts in Canada, a working year is:

7.5 h/shift x 3 shifts x 5 d/w x 50 w/a = 5625 hours/a

Taking the failure rates per hour above, yields:

PLa = one failure in 1.8 years of operation to one failure in 17 years

PLb = one failure in 533 years of operation

Now you should be starting to get an idea about where this is going. It’s important to remember that probabilities are just that – the failure could happen in the first hour of operation or at any time after that, or never. These figures give you some way to gauge the relative reliability of the design, and ARE NOT any sort of guarantee.

Watch for the next post in this series where I will look at Category 1 requirements!

References

[1] Safety of Machinery – Safety Related Parts of Control Systems – Part 1: General Principles for Design. CEN Standard EN 954-1. 1996.

[2] Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. ISO Standard 13849-1. 2006.

[3] Safety of machinery — Safety-related parts of control systems — Part 2: Validation, ISO Standard 13849-2. 2003.

[4] Safety of machinery — Safety-related parts of control systems — Part 100: Guidelines for the use and application of ISO 13849-1. ISO Technical Report TR 100. 2000.

[5] Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. CEN Standard EN ISO 13849-1. 2008.

Download ISO Standards

Interlock Architectures – Pt. 2: Category 1

This entry is part 2 of 8 in the series Circuit Architectures Explored

This article expands on the first in the series “Interlock Architectures – Pt. 1: What do those categories really mean?”. Learn about the basic circuit architectures that underlie all safety interlock systems under ISO 13849-1, and CSA Z432 and ANSI RIA R15.06.

This entry is part 2 of 8 in the series Circuit Architectures Explored

In Part 1 of this series we explored Category B, the Basic Category that underpins all the other Categories. This post builds on Part 1 by taking a look at Category 1. Let’s start by exploring the difference as defined in ISO 13849-1. When you are reading, remember that “SRP/CS” stands for “Safety Related Parts of Control Systems”.

SRP/CS of Category 1 shall be designed and constructed using well-tried components and well-tried safety principles (see ISO 13849-2).

Well-Tried Components

So what, exactly, is a “Well-Tried Component”?? Let’s go back to the standard for that:

A “well-tried component” for a safety-related application is a component which has been either

a) widely used in the past with successful results in similar applications, or
b) made and verified using principles which demonstrate its suitability and reliability for safety-related applications.

Newly developed components and safety principles may be considered as equivalent to “well-tried” if they fulfil the conditions of b).

The decision to accept a particular component as being “well-tried” depends on the application.

NOTE 1 Complex electronic components (e.g. PLC, microprocessor, application-specific integrated circuit) cannot be considered as equivalent to “well tried”.

[1, 6.2.4]

Lets look at what this all means by referring to ISO 13849-2:

Table 1 — Well-Tried Components [2]
Well-Tried Components Conditions for “well–tried” Standard or specification
Screw All factors influencing the screw connection and the application are to be considered. See Table A.2 “List of well–tried safety principles”. Mechanical jointing such as screws, nuts, washers, rivets, pins, bolts etc. are standardised.
Spring See Table A.2 “Use of a well–tried spring”. Technical specifications for spring steels and other special applications are given in ISO 4960.
Cam All factors influencing the cam arrangement (e. g. part of an interlocking device) are to be considered. See Table A.2 “List of well–tried safety principles”. See EN 1088 (ISO 14119) (Interlocking devices).
Break–pin All factors influencing the application are to be considered. See Table A.2 “List of well-tried safety principles”.

Now we have a few ideas about what might constitute a ‘well-tried component’. Unfortunately, you will notice that ‘contactor’ or ‘relay’ or ‘limit switch’ appear nowhere on the list. This is a challenge, but one that can be overcome. The key to dealing with this is to look at how the components that you are choosing to use are constructed. If they use these components and techniques, you are on your way to considering them to be well-tried.

Another approach is to let the component manufacturer worry about the details of the construction of the device, and simply ensure that components selected for use in the SRP/CS are ‘safety rated’ by the manufacturer. This can work in 80-90% of cases, with a small percentage of components, such as large motor starters, some servo and stepper drives and other similar components unavailable with a safety rating. It’s worth noting that many drive manufacturers are starting to produce drives with built-in safety components that are intended to be integrated into your SRP/CS.

Exclusion of Complex Electronics

Note 1 from the first part of the definition is very important. So important that I’m going to repeat it here:

NOTE 1 Complex electronic components (e.g. PLC, microprocessor, application-specific integrated circuit) cannot be considered as equivalent to “well tried”.

I added the bold text to emphasize the importance of this statement. While this is included in a Note and is therefore considered to be explanatory text and not part of the normative body of the standard, it illuminates a key concept. This little note is what prevents a standard PLC from being used in Category 1 systems. It’s also important to realize that this definition is only considering the hardware – no mention of software is made here, and software is not dealt with until later in the standard.

Well-Tried Safety Principles

Let’s have a look at what ‘Well-Tried Safety Principles’ might be.

Table 2 — Well-Tried Safety Principles [2, A.2]
Well-tried Safety Principles Remarks
Use of carefully selected materials and manufacturing Selection of suitable material, adequate manufacturing methods and treatments related to the application.
Use of components with oriented failure mode The predominant failure mode of a component is known in advance and always the same, see EN 292-2:1991, (ISO/TR 12100-2:1992), 3.7.4.
Over–dimensioning/safety factor The safety factors are given in standards or by good experience in safety-related applications.
Safe position The moving part of the component is held in one of the possible positions by mechanical means (friction only is not enough). Force is needed for changing the position.
Increased OFF force A safe position/state is obtained by an increased OFF force in relation to ON force.
Careful selection, combination, arrangement, assembly and installation of components/system related to the application
Careful selection of fastening related to the application Avoid relying only on friction.
Positive mechanical action Dependent operation (e. g. parallel operation) between parts is obtained by positive mechanical link(s). Springs and similar “flexible” elements should not be part of the link(s) [see EN 292-2:1991 (ISO/TR 12100-2:1992), 3.5].
Multiple parts Reducing the effect of faults by multiplying parts, e. g. where a fault of one spring (of many springs) does not lead to a dangerous condition.
Use of well–tried spring (see also Table A.3) A well–tried spring requires:

  • use of carefully selected materials, manufacturing methods (e. g. presetting and cycling before use) and treatments (e. g. rolling and shot–peening),
  • sufficient guidance of the spring, and
  • sufficient safety factor for fatigue stress (i. e. with high probability a fracture will not occur).

Well–tried pressure coil springs may also be designed by:

  • use of carefully selected materials, manufacturing methods (e. g. presetting and cycling before use) and treatments (e. g. rolling and shot-peening),
  • sufficient guidance of the spring, and
  • clearance between the turns less than the wire diameter when unloaded, and
  • sufficient force after a fracture(s) is maintained (i. e. a fracture(s) will not lead to a dangerous condition).
Limited range of force and similar parameters Decide the necessary limitation in relation to the experience and application. Examples for limitations are break pin, break plate, torque limiting clutch.
Limited range of speed and similar parameters Decide the necessary limitation in relation to the experience and application. Examples for limitations are centrifugal governor; safe monitoring of speed or limited displacement.
Limited range of environmental parameters Decide the necessary limitations. Examples on parameters are temperature, humidity, pollution at the installation. See clause 8 and consider manufacturer’s application notes.
Limited range of reaction time, limited hysteresis Decide the necessary limitations.
Consider e. g. spring tiredness, friction, lubrication, temperature, inertia during acceleration and deceleration,
combination of tolerances.

Use of Positive-Mode Operation

The use of these principles in the components, as well as in the overall design of the safeguards is important. In developing a system that uses ‘positive mode operation’, the mechanical linkage that operates the electrical contacts or the fluid-power valve that controls the prime-mover(s) (i.e. motors, cylinders, etc.), must act to directly drive the control element (contacts or valve spool) to the safe state. Springs can be used to return the system to the run state or dangerous state, since a failure of the spring will result in the interlock device staying in the safe state (fail-safe or fail-to-safety).

CSA Z432 [3] provides us with a nice diagram that illustrates the idea of “positive-action” or “positive-mode” operation:

CSA Z432 Fig B.10 - Positive Mode Operation
Figure 1 – Positive Mode Operation [3, B.10]

In Fig. 1, opening the guard door forces the roller to follow the cam attached to the door, driving the switch contacts apart and opening the interlock. Even if the contacts were to weld, they would still be driven apart since the mechanical advantage provided by the width of the door and the cam are more than enough to force the contacts apart.

Here’s an example of a ‘negative mode’ operation:

CSA Z432-04 Fig B.11 - Negative Mode operation
Figure 2 – Negative Mode operation [3, B.11]

In Fig. 2, the interlock switch relies on a spring to enter the safe state when the door is opened. If the spring in the interlock device fails, the system fails-to-danger. Also note that this design is very easy to defeat. A ‘zip-tie’ or some tape is all that would be required to keep the interlock in the ‘RUN’ condition.

You should have a better idea of what is meant when you read about positive and negative-modes of operation now. We’ll talk about defeat resistance in another article.

Reliability

Combining what you’ve learned so far, you can see that correctly specified components, combined with over-dimensioning and implementation of design limits along with the use of well-tried safety principles will go a long way to improving the reliability of the control system. The next part of the definition of Category 1 speaks to some additional requirements:

The MTTFd of each channel shall be high.

The maximum PL achievable with category 1 is PL = c.

NOTE 2 There is no diagnostic coverage (DCavg = none) within category 1 systems. In such structures (single-channel systems) the consideration of CCF is not relevant.

NOTE 3 When a fault occurs it can lead to the loss of the safety function. However, the MTTFd of each channel in category 1 is higher than in category B. Consequently, the loss of the safety function is less likely.

We now know that the integrity of a Category 1 system is greater than a Category B system, since the channel MTTFd of the system has gone from “Low-to-Medium” in systems exhibiting PLa or PLb performance to “High” in systems exhibiting PLb or PLc performance. [1, Table 5] shows this difference in terms of predicted years to failure. As you can see, MTTFd “High” results in a predicted failure rate between 30 and 100 years. This is a pretty good result for simply improving the components used in the system!

Table 3 – Mean time to dangerous failure  [1, Table 5]
Table 3 – Mean time to dangerous failure

The other benefit is the increase in the overall PL. Where Category B architecture can provide PLb performance at best, Category 1 takes this up a notch to PLc. To get a handle on what PLc means, let’s look at our single and three shift examples again. If we take a Canadian operation with a single shift per day, and a 50 week working year we get:

7.5 h/shift x 5 d/w x 50 w/a = 1875 h/a

Where

h = hours

d = days

w = weeks

a  = years

In this case, PLc is equivalent to one failure in 533.3 years of operation to 1600 years of operation.

Looking at three shifts per day in the same operation gives us:

7.5 h/shift x 3 shifts/d x 5 d/w x 50 w/a = 5625 h/a

In this case, PLc is equivalent to one failure in 177.8 years of operation to 533.3 years of operation.

When completing the analysis of a system, [1] limits the system MTTFd to 100 years regardless of what the individual channel MTTFd may be. Where the actual MTTFd is important relates to the need to replace components during the lifetime of the product. If a component or a sub-system has an MTTFd that is less than the mission time of the system, then the component or subsystem must be replaced by the time the product reaches it’s MTTFd. 20 years is the default mission time, but you can choose a shorter or longer time span if it makes sense.

Remember that these are probabilities, not guarantees. A failure could happen in the first hour of operation, the last hour of operation or never. These figures simply provide a way for you as the designer to gauge the relative reliability of the system.

Well-Tried Components versus Fault Exclusions

The standard goes on to outline some key distinctions between ‘well-tried component’ and ‘fault exclusion’. We’ll talk more about fault exclusions later in the series.

It is important that a clear distinction between “well-tried component” and “fault exclusion” (see Clause 7) be made. The qualification of a component as being well-tried depends on its application. For example, a position switch with positive opening contacts could be considered as being well-tried for a machine tool, while at the same time as being inappropriate for application in a food industry — in the milk industry, for instance, this switch would be destroyed by the milk acid after a few months. A fault exclusion can lead to a very high PL, but the appropriate measures to allow this fault exclusion should be applied during the whole lifetime of the device. In order to ensure this, additional measures outside the control system may be necessary. In the case of a position switch, some examples of these kinds of measures are

  • means to secure the fixing of the switch after its adjustment,
  • means to secure the fixing of the cam,
  • means to ensure the transverse stability of the cam,
  • means to avoid over travel of the position switch, e.g. adequate mounting strength of the shock absorber and any alignment devices, and
  • means to protect it against damage from outside.

[1, 6.2.4]

System Block Diagram

Finally, let’s look at the block diagram for Category 1. You will notice that it looks the same as the Category B block diagram, since only the components used in the system have changed, and not the architecture.

ISO 13849-1 Figure 9
Figure 3 – Category 1 Block Diagram [1, Fig. 9]

References

[1]       Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. ISO Standard 13849-1, Ed. 2. 2006.

[2]       Safety of machinery — Safety-related parts of control systems — Part 2: Validation. ISO Standard 13849-2, Ed. 2. 2012.

[3]       Safeguarding of Machinery. CSA Standard Z432. 2004.

Add to your Library

If you are working on implementing these design standards in your products, you need to buy copies of the standards for your library.

  • ISO 13849-1:2006 Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design
  • ISO 13849-2:2003 Safety of machinery — Safety-related parts of control systems — Part 2: Validation

Download IEC standards, International Electrotechnical Commission standards.

If you are working in the EU, or are working on CE Marking your product, you should hold the harmonized version of this standard, available through the CEN resellers:

  • EN ISO 13849-1:2008 Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design
  • EN ISO 13849-2:2012 Safety of machinery — Safety-related parts of control systems — Part 2: Validation

Next Installment

Watch for the next part of this series, “Interlock Architectures – Pt. 3: Category 2” where we expand on the first two categories by adding some diagnostic coverage to improve reliability.

Have questions? Email me!

Interlock Architectures – Pt. 3: Category 2

This entry is part 3 of 8 in the series Circuit Architectures Explored

This article explores the requirements for safety related control systems meeting ISO 13849-1 Category 2 requirements. “Gotcha!” points in the definition are highlighted to help designers avoid this common pitfalls.

This entry is part 3 of 8 in the series Circuit Architectures Explored

In the first two posts in this series, we looked at Category B, the Basic category of system architecture, and then moved on to look at Category 1. Category B underpins Categories 2, 3 and 4. In this post we’ll look more deeply into Category 2.

Let’s start by looking at the definition for Category 2, taken from ISO 13849-1:2007. Remember that in these excerpts, SRP/CS stands for Safety Related Parts of Control Systems.

Definition

6.2.5 Category 2

For category 2, the same requirements as those according to 6.2.3 for category B shall apply. “Well–tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies.

SRP/CS of category 2 shall be designed so that their function(s) are checked at suitable intervals by the machine control system. The check of the safety function(s) shall be performed

  • at the machine start-up, and
  • prior to the initiation of any hazardous situation, e.g. start of a new cycle, start of other movements, and/or
  • periodically during operation if the risk assessment and the kind of operation shows that it is necessary.

The initiation of this check may be automatic. Any check of the safety function(s) shall either

  • allow operation if no faults have been detected, or
  • generate an output which initiates appropriate control action, if a fault is detected.

Whenever possible this output shall initiate a safe state. This safe state shall be maintained until the fault is cleared. When it is not possible to initiate a safe state (e.g. welding of the contact in the final switching device) the output shall provide a warning of the hazard.

For the designated architecture of category 2, as shown in Figure 10, the calculation of MTTFd and DCavg should take into account only the blocks of the functional channel (i.e. I, L and O in Figure 10) and not the blocks of the testing channel (i.e. TE and OTE in Figure 10).

The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low. The MTTFd of each channel shall be low-to-high, depending on the required performance level (PLr). Measures against CCF shall be applied (see Annex F).

The check itself shall not lead to a hazardous situation (e.g. due to an increase in response time). The checking equipment may be integral with, or separate from, the safety-related part(s) providing the safety function.

The maximum PL achievable with category 2 is PL = d.

NOTE 1 In some cases category 2 is not applicable because the checking of the safety function cannot be applied to all components.

NOTE 2 Category 2 system behaviour allows that

  • the occurrence of a fault can lead to the loss of the safety function between checks,
  • the loss of safety function is detected by the check.

NOTE 3 The principle that supports the validity of a category 2 function is that the adopted technical provisions, and, for example, the choice of checking frequency can decrease the probability of occurrence of a dangerous situation.

ISO 13849-1 Figure 10
Figure 1 – Category 2 Block diagram [1, Fig.10]

Breaking it down

Let start by taking apart the definition a piece at a time and looking at what each part means. I’ll also show a simple circuit that can meet the requirements.

Category B & Well-tried Safety Principles

The first paragraph speaks to the building block approach taken in the standard:

For category 2, the same requirements as those according to 6.2.3 for category B shall apply. “Well–tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies.

Systems meeting Category 2 are required to meet all of the same requirements as Category B, as far as the components are concerned. Other requirements for the circuits are different, and we will look at those in a bit.

Self-Testing required

Category 2 brings in the idea of diagnostics. If correctly specified components have been selected (Category B), and are applied following ‘well-tried safety principles’, then adding a diagnostic component to the system should allow the system to detect some faults and therefore achieve a certain degree of ‘fault-tolerance’ or the ability to function correctly even when some aspect of the system has failed.

Let’s look at the text:

SRP/CS of Category 2 shall be designed so that their function(s) are checked at suitable intervals by the machine control system. The check of the safety function(s) shall be performed

  • at the machine start-up, and
  • prior to the initiation of any hazardous situation, e.g. start of a new cycle, start of other movements, and/or
  • periodically during operation if the risk assessment and the kind of operation shows that it is necessary.

The initiation of this check may be automatic. Any check of the safety function(s) shall either

  • allow operation if no faults have been detected, or
  • generate an output which initiates appropriate control action, if a fault is detected.

Whenever possible this output shall initiate a safe state. This safe state shall be maintained until the fault is cleared. When it is not possible to initiate a safe state (e.g. welding of the contact in the final switching device) the output shall provide a warning of the hazard.

Periodic checking is required. The checks must happen at least each time there is a demand placed on the system, i.e. a guard door is opened and closed, or an emergency stop button is pressed and reset. In addition the integrity of the SRP/CS must be tested at the start of a cycle or hazardous period, and potentially periodically during operation if the risk assessment indicates that this is necessary. The testing frequency must be at least 100x the demand rate [1, 4.5.4], e.g., a light curtain on a part loading work station that is interrupted every 30 s during normal operation requires a minimum test rate of once every 0.3 s, or 200x per minute or more.

The testing does not have to be automatic, although in practice it usually is. As long as the system integrity is good, then the output is allowed to remain on, and the machinery or process can run.

Watch Out!

Notice that the words ‘whenever possible’ are used in the last paragraph in this part of the definition where the standard speaks about initiation of a safe state. This wording alludes to the fact that these systems are still prone to faults that can lead to the loss of the safety function, and so cannot be called truly ‘fault-tolerant’. Loss of the safety function must be detected by the monitoring system and a safe state initiated. This requires careful thought, since the safety system components may have to interact with the process control system to initiate and maintain the safe state in the event that the safety system itself has failed. Also note that it is not possible to use fault exclusions in Category 2 architecture, because the system is not fault tolerant.

All of this leads to an interesting question: If the system is hardwired through the operating channel, and all the components used in that channel meet Category B requirements, can the diagnostic component be provided by a monitoring the system with a standard PLC? The answer to this is YES. Test equipment (called TE in Fig. 1) is specifically excluded, and Category 2 DOES NOT require the use of well-tried components, only well-tried safety principles.

Finally, for the faults that can be detected by the monitoring system, detection of a fault must initiate a safe state. This means that on the next demand on the system, i.e. the next time the guard is opened or the emergency stop is pressed, the machine must go into a safe condition. Generally, detection of a fault should prevent the subsequent reset of the system until the fault is cleared or repaired.

Testing is not permitted to introduce any new hazards or to slow the system down. The tests must occur ‘on-the-fly’ and without introducing any delay in the system compared to how it would have operated without the testing incorporated. Test equipment can be integrated into the safety system or be external to it.

One more ‘gotcha’

Note 1 in the definition highlights a significant pitfall for many designers: if all of the components in the functional channel of the system cannot be checked, you cannot claim conformity to Category 2. If you look back at Fig. 1, you will see that the dashed “m” lines connect all three functional blocks to the TE, indicating that all three must be included in the monitoring channel. A system that otherwise would meet the architectural requirements for Category 2 must be downgraded to Category 1 in cases where all the components in the functional channel cannot be tested. This is a major point and one which many designers miss when developing their systems.

Calculation of MTTFd

The next paragraph deals with the calculation of the failure rate of the system, or MTTFd.

For the designated architecture of category 2, as shown in Figure 10, the calculation of MTTFd and DCavg should take into account only the blocks of the functional channel (i.e. I, L and O in Figure 10) and not the blocks of the testing channel (i.e. TE and OTE in Figure 10).

Calculation of the failure rate focuses on the functional channel, not on the monitoring system, meaning that the failure rate of the monitoring system is ignored when analyzing systems using this architecture. The MTTFd of each component in the functional channel is calculated and then the MTTFd of the total channel is calculated.

The Diagnostic Coverage (DCavg) is also calculated based exclusively on the components in the functional channel, so when determining what percentage of the faults can be detected by the monitoring equipment, only faults in the functional channel are considered.

This highlights the fact that a failure of the monitoring system cannot be detected, so a single failure in the monitoring system that results in the system failing to detect a subsequent normally detectable failure in the functional channel will result in the loss of the safety function.

Summing Up

The next paragraph sums up the limits of this particular architecture:

The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low. The MTTFd of each channel shall be low-to-high, depending on the required performance level (PLr). Measures against CCF shall be applied (see Annex F).

The first sentence reflects back to the previous paragraph on diagnostic coverage, telling you, as the designer, that you cannot make a claim to anything more than LOW DC coverage when using this architecture.

This raises an interesting question, since Figure 5 in the standard shows columns for both DCavg = LOW and DCavg=MED. My best advice to you as a user of the standard is to abide by the text, meaning that you cannot claim higher than LOW for DCavg in this architecture. This conflict will be addressed by future revisions of the standard.

Another problem raised by this sentence is the inclusion of the phrase “the total SRP/CS including fault-detection”, since the previous paragraph explicitly tells you that the assessment of DCavg ‘should’ only include the functional channel, while this sentence appears to include it. In standards writing, sentences including the word ‘shall’ are clearly mandatory, while those including the word ‘should’ indicate a condition which is advised but not required. Hopefully this confusion will be clarified in the next edition of the standard.

MTTFd in the functional channel can be anywhere in the range from LOW to HIGH depending on the components selected and the way they are applied in the design. The requirement will be driven by the desired PL of the system, so a PLd system will require HIGH MTTFd components in the functional channel, while the same architecture used for a PLb system would require only LOW MTTFd components.
Finally, applicable measures against Common Cause Failures (CCF) must be used. Some of the measures given in Table F.1 in Annex F of the standard cannot be applied, such as Channel Separation, since you cannot separate a single channel. Other CCF measures can and must be applied, and so therefore you must score at least the minimum 65 on the CCF table in Annex F to claim compliance with Category 2 requirements.

Example Circuit

Here’s an example of what a simple Category 2 circuit constructed from discrete components might look like. Note that PB1 and PB2 could just as easily be interlock switches on guard doors as push buttons on a control panel. For the sake of simplicity, I did not illustrate surge suppression on the relays, but you should include MOV’s or RC suppressors across all relay coils. All relays are considered to be constructed with  ‘force-guided’ designs and meet the requirements for well-tried components.

Example Category 2 circuit from discrete components
Figure 2 – Example Category 2 circuit from discrete components

How the circuit works:

  1. The machine is stopped with power off. CR1, CR2, and M are off. CR3 is off until the reset button is pressed, since the NC monitoring contacts on CR1, CR2 and M are all closed, but the NO reset push button contact is open.
  2. The reset push button, PB3,  is pressed. If both CR1, CR2 and M are off, their normally closed contacts will be closed, so pressing PB3 will result in CR3 turning on.
  3. CR3 closes its contacts, energizing CR1 and CR2 which seal their contact circuits in and de-energize CR3. The time delays inherent in relays permit this to work.
  4. With CR1 and CR2 closed and CR3 held off because its coil circuit opened when CR1 and CR2 turned on, M energizes and motion can start.

In this circuit the monitoring function is provided by CR3. If any of CR1, CR2 or M were to weld closed, CR3 could not energize, and so a single fault is detected and the machine is prevented from re-starting. If the machine is stopped by pressing either PB1 or PB2, the machine will stop since CR1 and CR2 are redundant. If CR3 fails with welded contacts, then the M rung is held open because CR3 has not de-energized, and if it fails with an open coil, the reset function will not work, therefore both failure modes will prevent the machine from starting with a failed monitoring system, if a “force-guided” type of relay is used for CR3. If CR1 or CR2 fail with an open coil, then M cannot energize because of the redundant contacts on the M rung.

This circuit cannot detect a failure in PB1, PB2, or PB3. Testing is conducted each time the circuit is reset. This circuit does not meet the 100x test rate requirement, and so cannot be said to meet Category 2 requirements.

If M is a motor starter rather than the motor itself, it will need to be duplicated for redundancy and a monitoring contact added to the CR3 rung .

In calculating MTTFd, PB1, PB2, CR1, CR2, CR3 and M must be included. CR3 is included because it has a functional contact in the M rung and is therefore part of the functional channel of the circuit as well as being part of the OT and OTE channels.

Download IEC standards, International Electrotechnical Commission standards.
Download ISO Standards

Watch for the next installment in this series where we’ll explore Category 3, the first of the ‘fault tolerant’ architectures!