Interlock Architectures – Pt. 3: Category 2

This entry is part 3 of 8 in the series Circuit Architectures Explored

This article explores the requirements for safety related control systems meeting ISO 13849-1 Category 2 requirements. “Gotcha!” points in the definition are highlighted to help designers avoid this common pitfalls.

This entry is part 3 of 8 in the series Circuit Architectures Explored

In the first two posts in this series, we looked at Category B, the Basic category of system architecture, and then moved on to look at Category 1. Category B underpins Categories 2, 3 and 4. In this post we’ll look more deeply into Category 2.

Let’s start by looking at the definition for Category 2, taken from ISO 13849-1:2007. Remember that in these excerpts, SRP/CS stands for Safety Related Parts of Control Systems.

Definition

6.2.5 Category 2

For category 2, the same requirements as those according to 6.2.3 for category B shall apply. “Well–tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies.

SRP/CS of category 2 shall be designed so that their function(s) are checked at suitable intervals by the machine control system. The check of the safety function(s) shall be performed

  • at the machine start-up, and
  • prior to the initiation of any hazardous situation, e.g. start of a new cycle, start of other movements, and/or
  • periodically during operation if the risk assessment and the kind of operation shows that it is necessary.

The initiation of this check may be automatic. Any check of the safety function(s) shall either

  • allow operation if no faults have been detected, or
  • generate an output which initiates appropriate control action, if a fault is detected.

Whenever possible this output shall initiate a safe state. This safe state shall be maintained until the fault is cleared. When it is not possible to initiate a safe state (e.g. welding of the contact in the final switching device) the output shall provide a warning of the hazard.

For the designated architecture of category 2, as shown in Figure 10, the calculation of MTTFd and DCavg should take into account only the blocks of the functional channel (i.e. I, L and O in Figure 10) and not the blocks of the testing channel (i.e. TE and OTE in Figure 10).

The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low. The MTTFd of each channel shall be low-to-high, depending on the required performance level (PLr). Measures against CCF shall be applied (see Annex F).

The check itself shall not lead to a hazardous situation (e.g. due to an increase in response time). The checking equipment may be integral with, or separate from, the safety-related part(s) providing the safety function.

The maximum PL achievable with category 2 is PL = d.

NOTE 1 In some cases category 2 is not applicable because the checking of the safety function cannot be applied to all components.

NOTE 2 Category 2 system behaviour allows that

  • the occurrence of a fault can lead to the loss of the safety function between checks,
  • the loss of safety function is detected by the check.

NOTE 3 The principle that supports the validity of a category 2 function is that the adopted technical provisions, and, for example, the choice of checking frequency can decrease the probability of occurrence of a dangerous situation.

ISO 13849-1 Figure 10
Figure 1 – Category 2 Block diagram [1, Fig.10]

Breaking it down

Let start by taking apart the definition a piece at a time and looking at what each part means. I’ll also show a simple circuit that can meet the requirements.

Category B & Well-tried Safety Principles

The first paragraph speaks to the building block approach taken in the standard:

For category 2, the same requirements as those according to 6.2.3 for category B shall apply. “Well–tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies.

Systems meeting Category 2 are required to meet all of the same requirements as Category B, as far as the components are concerned. Other requirements for the circuits are different, and we will look at those in a bit.

Self-Testing required

Category 2 brings in the idea of diagnostics. If correctly specified components have been selected (Category B), and are applied following ‘well-tried safety principles’, then adding a diagnostic component to the system should allow the system to detect some faults and therefore achieve a certain degree of ‘fault-tolerance’ or the ability to function correctly even when some aspect of the system has failed.

Let’s look at the text:

SRP/CS of Category 2 shall be designed so that their function(s) are checked at suitable intervals by the machine control system. The check of the safety function(s) shall be performed

  • at the machine start-up, and
  • prior to the initiation of any hazardous situation, e.g. start of a new cycle, start of other movements, and/or
  • periodically during operation if the risk assessment and the kind of operation shows that it is necessary.

The initiation of this check may be automatic. Any check of the safety function(s) shall either

  • allow operation if no faults have been detected, or
  • generate an output which initiates appropriate control action, if a fault is detected.

Whenever possible this output shall initiate a safe state. This safe state shall be maintained until the fault is cleared. When it is not possible to initiate a safe state (e.g. welding of the contact in the final switching device) the output shall provide a warning of the hazard.

Periodic checking is required. The checks must happen at least each time there is a demand placed on the system, i.e. a guard door is opened and closed, or an emergency stop button is pressed and reset. In addition the integrity of the SRP/CS must be tested at the start of a cycle or hazardous period, and potentially periodically during operation if the risk assessment indicates that this is necessary. The testing frequency must be at least 100x the demand rate [1, 4.5.4], e.g., a light curtain on a part loading work station that is interrupted every 30 s during normal operation requires a minimum test rate of once every 0.3 s, or 200x per minute or more.

The testing does not have to be automatic, although in practice it usually is. As long as the system integrity is good, then the output is allowed to remain on, and the machinery or process can run.

Watch Out!

Notice that the words ‘whenever possible’ are used in the last paragraph in this part of the definition where the standard speaks about initiation of a safe state. This wording alludes to the fact that these systems are still prone to faults that can lead to the loss of the safety function, and so cannot be called truly ‘fault-tolerant’. Loss of the safety function must be detected by the monitoring system and a safe state initiated. This requires careful thought, since the safety system components may have to interact with the process control system to initiate and maintain the safe state in the event that the safety system itself has failed. Also note that it is not possible to use fault exclusions in Category 2 architecture, because the system is not fault tolerant.

All of this leads to an interesting question: If the system is hardwired through the operating channel, and all the components used in that channel meet Category B requirements, can the diagnostic component be provided by a monitoring the system with a standard PLC? The answer to this is YES. Test equipment (called TE in Fig. 1) is specifically excluded, and Category 2 DOES NOT require the use of well-tried components, only well-tried safety principles.

Finally, for the faults that can be detected by the monitoring system, detection of a fault must initiate a safe state. This means that on the next demand on the system, i.e. the next time the guard is opened or the emergency stop is pressed, the machine must go into a safe condition. Generally, detection of a fault should prevent the subsequent reset of the system until the fault is cleared or repaired.

Testing is not permitted to introduce any new hazards or to slow the system down. The tests must occur ‘on-the-fly’ and without introducing any delay in the system compared to how it would have operated without the testing incorporated. Test equipment can be integrated into the safety system or be external to it.

One more ‘gotcha’

Note 1 in the definition highlights a significant pitfall for many designers: if all of the components in the functional channel of the system cannot be checked, you cannot claim conformity to Category 2. If you look back at Fig. 1, you will see that the dashed “m” lines connect all three functional blocks to the TE, indicating that all three must be included in the monitoring channel. A system that otherwise would meet the architectural requirements for Category 2 must be downgraded to Category 1 in cases where all the components in the functional channel cannot be tested. This is a major point and one which many designers miss when developing their systems.

Calculation of MTTFd

The next paragraph deals with the calculation of the failure rate of the system, or MTTFd.

For the designated architecture of category 2, as shown in Figure 10, the calculation of MTTFd and DCavg should take into account only the blocks of the functional channel (i.e. I, L and O in Figure 10) and not the blocks of the testing channel (i.e. TE and OTE in Figure 10).

Calculation of the failure rate focuses on the functional channel, not on the monitoring system, meaning that the failure rate of the monitoring system is ignored when analyzing systems using this architecture. The MTTFd of each component in the functional channel is calculated and then the MTTFd of the total channel is calculated.

The Diagnostic Coverage (DCavg) is also calculated based exclusively on the components in the functional channel, so when determining what percentage of the faults can be detected by the monitoring equipment, only faults in the functional channel are considered.

This highlights the fact that a failure of the monitoring system cannot be detected, so a single failure in the monitoring system that results in the system failing to detect a subsequent normally detectable failure in the functional channel will result in the loss of the safety function.

Summing Up

The next paragraph sums up the limits of this particular architecture:

The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low. The MTTFd of each channel shall be low-to-high, depending on the required performance level (PLr). Measures against CCF shall be applied (see Annex F).

The first sentence reflects back to the previous paragraph on diagnostic coverage, telling you, as the designer, that you cannot make a claim to anything more than LOW DC coverage when using this architecture.

This raises an interesting question, since Figure 5 in the standard shows columns for both DCavg = LOW and DCavg=MED. My best advice to you as a user of the standard is to abide by the text, meaning that you cannot claim higher than LOW for DCavg in this architecture. This conflict will be addressed by future revisions of the standard.

Another problem raised by this sentence is the inclusion of the phrase “the total SRP/CS including fault-detection”, since the previous paragraph explicitly tells you that the assessment of DCavg ‘should’ only include the functional channel, while this sentence appears to include it. In standards writing, sentences including the word ‘shall’ are clearly mandatory, while those including the word ‘should’ indicate a condition which is advised but not required. Hopefully this confusion will be clarified in the next edition of the standard.

MTTFd in the functional channel can be anywhere in the range from LOW to HIGH depending on the components selected and the way they are applied in the design. The requirement will be driven by the desired PL of the system, so a PLd system will require HIGH MTTFd components in the functional channel, while the same architecture used for a PLb system would require only LOW MTTFd components.
Finally, applicable measures against Common Cause Failures (CCF) must be used. Some of the measures given in Table F.1 in Annex F of the standard cannot be applied, such as Channel Separation, since you cannot separate a single channel. Other CCF measures can and must be applied, and so therefore you must score at least the minimum 65 on the CCF table in Annex F to claim compliance with Category 2 requirements.

Example Circuit

Here’s an example of what a simple Category 2 circuit constructed from discrete components might look like. Note that PB1 and PB2 could just as easily be interlock switches on guard doors as push buttons on a control panel. For the sake of simplicity, I did not illustrate surge suppression on the relays, but you should include MOV’s or RC suppressors across all relay coils. All relays are considered to be constructed with  ‘force-guided’ designs and meet the requirements for well-tried components.

Example Category 2 circuit from discrete components
Figure 2 – Example Category 2 circuit from discrete components

How the circuit works:

  1. The machine is stopped with power off. CR1, CR2, and M are off. CR3 is off until the reset button is pressed, since the NC monitoring contacts on CR1, CR2 and M are all closed, but the NO reset push button contact is open.
  2. The reset push button, PB3,  is pressed. If both CR1, CR2 and M are off, their normally closed contacts will be closed, so pressing PB3 will result in CR3 turning on.
  3. CR3 closes its contacts, energizing CR1 and CR2 which seal their contact circuits in and de-energize CR3. The time delays inherent in relays permit this to work.
  4. With CR1 and CR2 closed and CR3 held off because its coil circuit opened when CR1 and CR2 turned on, M energizes and motion can start.

In this circuit the monitoring function is provided by CR3. If any of CR1, CR2 or M were to weld closed, CR3 could not energize, and so a single fault is detected and the machine is prevented from re-starting. If the machine is stopped by pressing either PB1 or PB2, the machine will stop since CR1 and CR2 are redundant. If CR3 fails with welded contacts, then the M rung is held open because CR3 has not de-energized, and if it fails with an open coil, the reset function will not work, therefore both failure modes will prevent the machine from starting with a failed monitoring system, if a “force-guided” type of relay is used for CR3. If CR1 or CR2 fail with an open coil, then M cannot energize because of the redundant contacts on the M rung.

This circuit cannot detect a failure in PB1, PB2, or PB3. Testing is conducted each time the circuit is reset. This circuit does not meet the 100x test rate requirement, and so cannot be said to meet Category 2 requirements.

If M is a motor starter rather than the motor itself, it will need to be duplicated for redundancy and a monitoring contact added to the CR3 rung .

In calculating MTTFd, PB1, PB2, CR1, CR2, CR3 and M must be included. CR3 is included because it has a functional contact in the M rung and is therefore part of the functional channel of the circuit as well as being part of the OT and OTE channels.

Download IEC standards, International Electrotechnical Commission standards.
Download ISO Standards

Watch for the next installment in this series where we’ll explore Category 3, the first of the ‘fault tolerant’ architectures!

Interlock Architectures – Pt. 2: Category 1

This entry is part 2 of 8 in the series Circuit Architectures Explored

This article expands on the first in the series “Interlock Architectures – Pt. 1: What do those categories really mean?”. Learn about the basic circuit architectures that underlie all safety interlock systems under ISO 13849-1, and CSA Z432 and ANSI RIA R15.06.

This entry is part 2 of 8 in the series Circuit Architectures Explored

In Part 1 of this series we explored Category B, the Basic Category that underpins all the other Categories. This post builds on Part 1 by taking a look at Category 1. Let’s start by exploring the difference as defined in ISO 13849-1. When you are reading, remember that “SRP/CS” stands for “Safety Related Parts of Control Systems”.

SRP/CS of Category 1 shall be designed and constructed using well-tried components and well-tried safety principles (see ISO 13849-2).

Well-Tried Components

So what, exactly, is a “Well-Tried Component”?? Let’s go back to the standard for that:

A “well-tried component” for a safety-related application is a component which has been either

a) widely used in the past with successful results in similar applications, or
b) made and verified using principles which demonstrate its suitability and reliability for safety-related applications.

Newly developed components and safety principles may be considered as equivalent to “well-tried” if they fulfil the conditions of b).

The decision to accept a particular component as being “well-tried” depends on the application.

NOTE 1 Complex electronic components (e.g. PLC, microprocessor, application-specific integrated circuit) cannot be considered as equivalent to “well tried”.

[1, 6.2.4]

Lets look at what this all means by referring to ISO 13849-2:

Table 1 — Well-Tried Components [2]
Well-Tried Components Conditions for “well–tried” Standard or specification
Screw All factors influencing the screw connection and the application are to be considered. See Table A.2 “List of well–tried safety principles”. Mechanical jointing such as screws, nuts, washers, rivets, pins, bolts etc. are standardised.
Spring See Table A.2 “Use of a well–tried spring”. Technical specifications for spring steels and other special applications are given in ISO 4960.
Cam All factors influencing the cam arrangement (e. g. part of an interlocking device) are to be considered. See Table A.2 “List of well–tried safety principles”. See EN 1088 (ISO 14119) (Interlocking devices).
Break–pin All factors influencing the application are to be considered. See Table A.2 “List of well-tried safety principles”.

Now we have a few ideas about what might constitute a ‘well-tried component’. Unfortunately, you will notice that ‘contactor’ or ‘relay’ or ‘limit switch’ appear nowhere on the list. This is a challenge, but one that can be overcome. The key to dealing with this is to look at how the components that you are choosing to use are constructed. If they use these components and techniques, you are on your way to considering them to be well-tried.

Another approach is to let the component manufacturer worry about the details of the construction of the device, and simply ensure that components selected for use in the SRP/CS are ‘safety rated’ by the manufacturer. This can work in 80-90% of cases, with a small percentage of components, such as large motor starters, some servo and stepper drives and other similar components unavailable with a safety rating. It’s worth noting that many drive manufacturers are starting to produce drives with built-in safety components that are intended to be integrated into your SRP/CS.

Exclusion of Complex Electronics

Note 1 from the first part of the definition is very important. So important that I’m going to repeat it here:

NOTE 1 Complex electronic components (e.g. PLC, microprocessor, application-specific integrated circuit) cannot be considered as equivalent to “well tried”.

I added the bold text to emphasize the importance of this statement. While this is included in a Note and is therefore considered to be explanatory text and not part of the normative body of the standard, it illuminates a key concept. This little note is what prevents a standard PLC from being used in Category 1 systems. It’s also important to realize that this definition is only considering the hardware – no mention of software is made here, and software is not dealt with until later in the standard.

Well-Tried Safety Principles

Let’s have a look at what ‘Well-Tried Safety Principles’ might be.

Table 2 — Well-Tried Safety Principles [2, A.2]
Well-tried Safety Principles Remarks
Use of carefully selected materials and manufacturing Selection of suitable material, adequate manufacturing methods and treatments related to the application.
Use of components with oriented failure mode The predominant failure mode of a component is known in advance and always the same, see EN 292-2:1991, (ISO/TR 12100-2:1992), 3.7.4.
Over–dimensioning/safety factor The safety factors are given in standards or by good experience in safety-related applications.
Safe position The moving part of the component is held in one of the possible positions by mechanical means (friction only is not enough). Force is needed for changing the position.
Increased OFF force A safe position/state is obtained by an increased OFF force in relation to ON force.
Careful selection, combination, arrangement, assembly and installation of components/system related to the application
Careful selection of fastening related to the application Avoid relying only on friction.
Positive mechanical action Dependent operation (e. g. parallel operation) between parts is obtained by positive mechanical link(s). Springs and similar “flexible” elements should not be part of the link(s) [see EN 292-2:1991 (ISO/TR 12100-2:1992), 3.5].
Multiple parts Reducing the effect of faults by multiplying parts, e. g. where a fault of one spring (of many springs) does not lead to a dangerous condition.
Use of well–tried spring (see also Table A.3) A well–tried spring requires:

  • use of carefully selected materials, manufacturing methods (e. g. presetting and cycling before use) and treatments (e. g. rolling and shot–peening),
  • sufficient guidance of the spring, and
  • sufficient safety factor for fatigue stress (i. e. with high probability a fracture will not occur).

Well–tried pressure coil springs may also be designed by:

  • use of carefully selected materials, manufacturing methods (e. g. presetting and cycling before use) and treatments (e. g. rolling and shot-peening),
  • sufficient guidance of the spring, and
  • clearance between the turns less than the wire diameter when unloaded, and
  • sufficient force after a fracture(s) is maintained (i. e. a fracture(s) will not lead to a dangerous condition).
Limited range of force and similar parameters Decide the necessary limitation in relation to the experience and application. Examples for limitations are break pin, break plate, torque limiting clutch.
Limited range of speed and similar parameters Decide the necessary limitation in relation to the experience and application. Examples for limitations are centrifugal governor; safe monitoring of speed or limited displacement.
Limited range of environmental parameters Decide the necessary limitations. Examples on parameters are temperature, humidity, pollution at the installation. See clause 8 and consider manufacturer’s application notes.
Limited range of reaction time, limited hysteresis Decide the necessary limitations.
Consider e. g. spring tiredness, friction, lubrication, temperature, inertia during acceleration and deceleration,
combination of tolerances.

Use of Positive-Mode Operation

The use of these principles in the components, as well as in the overall design of the safeguards is important. In developing a system that uses ‘positive mode operation’, the mechanical linkage that operates the electrical contacts or the fluid-power valve that controls the prime-mover(s) (i.e. motors, cylinders, etc.), must act to directly drive the control element (contacts or valve spool) to the safe state. Springs can be used to return the system to the run state or dangerous state, since a failure of the spring will result in the interlock device staying in the safe state (fail-safe or fail-to-safety).

CSA Z432 [3] provides us with a nice diagram that illustrates the idea of “positive-action” or “positive-mode” operation:

CSA Z432 Fig B.10 - Positive Mode Operation
Figure 1 – Positive Mode Operation [3, B.10]

In Fig. 1, opening the guard door forces the roller to follow the cam attached to the door, driving the switch contacts apart and opening the interlock. Even if the contacts were to weld, they would still be driven apart since the mechanical advantage provided by the width of the door and the cam are more than enough to force the contacts apart.

Here’s an example of a ‘negative mode’ operation:

CSA Z432-04 Fig B.11 - Negative Mode operation
Figure 2 – Negative Mode operation [3, B.11]

In Fig. 2, the interlock switch relies on a spring to enter the safe state when the door is opened. If the spring in the interlock device fails, the system fails-to-danger. Also note that this design is very easy to defeat. A ‘zip-tie’ or some tape is all that would be required to keep the interlock in the ‘RUN’ condition.

You should have a better idea of what is meant when you read about positive and negative-modes of operation now. We’ll talk about defeat resistance in another article.

Reliability

Combining what you’ve learned so far, you can see that correctly specified components, combined with over-dimensioning and implementation of design limits along with the use of well-tried safety principles will go a long way to improving the reliability of the control system. The next part of the definition of Category 1 speaks to some additional requirements:

The MTTFd of each channel shall be high.

The maximum PL achievable with category 1 is PL = c.

NOTE 2 There is no diagnostic coverage (DCavg = none) within category 1 systems. In such structures (single-channel systems) the consideration of CCF is not relevant.

NOTE 3 When a fault occurs it can lead to the loss of the safety function. However, the MTTFd of each channel in category 1 is higher than in category B. Consequently, the loss of the safety function is less likely.

We now know that the integrity of a Category 1 system is greater than a Category B system, since the channel MTTFd of the system has gone from “Low-to-Medium” in systems exhibiting PLa or PLb performance to “High” in systems exhibiting PLb or PLc performance. [1, Table 5] shows this difference in terms of predicted years to failure. As you can see, MTTFd “High” results in a predicted failure rate between 30 and 100 years. This is a pretty good result for simply improving the components used in the system!

Table 3 – Mean time to dangerous failure  [1, Table 5]
Table 3 – Mean time to dangerous failure

The other benefit is the increase in the overall PL. Where Category B architecture can provide PLb performance at best, Category 1 takes this up a notch to PLc. To get a handle on what PLc means, let’s look at our single and three shift examples again. If we take a Canadian operation with a single shift per day, and a 50 week working year we get:

7.5 h/shift x 5 d/w x 50 w/a = 1875 h/a

Where

h = hours

d = days

w = weeks

a  = years

In this case, PLc is equivalent to one failure in 533.3 years of operation to 1600 years of operation.

Looking at three shifts per day in the same operation gives us:

7.5 h/shift x 3 shifts/d x 5 d/w x 50 w/a = 5625 h/a

In this case, PLc is equivalent to one failure in 177.8 years of operation to 533.3 years of operation.

When completing the analysis of a system, [1] limits the system MTTFd to 100 years regardless of what the individual channel MTTFd may be. Where the actual MTTFd is important relates to the need to replace components during the lifetime of the product. If a component or a sub-system has an MTTFd that is less than the mission time of the system, then the component or subsystem must be replaced by the time the product reaches it’s MTTFd. 20 years is the default mission time, but you can choose a shorter or longer time span if it makes sense.

Remember that these are probabilities, not guarantees. A failure could happen in the first hour of operation, the last hour of operation or never. These figures simply provide a way for you as the designer to gauge the relative reliability of the system.

Well-Tried Components versus Fault Exclusions

The standard goes on to outline some key distinctions between ‘well-tried component’ and ‘fault exclusion’. We’ll talk more about fault exclusions later in the series.

It is important that a clear distinction between “well-tried component” and “fault exclusion” (see Clause 7) be made. The qualification of a component as being well-tried depends on its application. For example, a position switch with positive opening contacts could be considered as being well-tried for a machine tool, while at the same time as being inappropriate for application in a food industry — in the milk industry, for instance, this switch would be destroyed by the milk acid after a few months. A fault exclusion can lead to a very high PL, but the appropriate measures to allow this fault exclusion should be applied during the whole lifetime of the device. In order to ensure this, additional measures outside the control system may be necessary. In the case of a position switch, some examples of these kinds of measures are

  • means to secure the fixing of the switch after its adjustment,
  • means to secure the fixing of the cam,
  • means to ensure the transverse stability of the cam,
  • means to avoid over travel of the position switch, e.g. adequate mounting strength of the shock absorber and any alignment devices, and
  • means to protect it against damage from outside.

[1, 6.2.4]

System Block Diagram

Finally, let’s look at the block diagram for Category 1. You will notice that it looks the same as the Category B block diagram, since only the components used in the system have changed, and not the architecture.

ISO 13849-1 Figure 9
Figure 3 – Category 1 Block Diagram [1, Fig. 9]

References

[1]       Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. ISO Standard 13849-1, Ed. 2. 2006.

[2]       Safety of machinery — Safety-related parts of control systems — Part 2: Validation. ISO Standard 13849-2, Ed. 2. 2012.

[3]       Safeguarding of Machinery. CSA Standard Z432. 2004.

Add to your Library

If you are working on implementing these design standards in your products, you need to buy copies of the standards for your library.

  • ISO 13849-1:2006 Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design
  • ISO 13849-2:2003 Safety of machinery — Safety-related parts of control systems — Part 2: Validation

Download IEC standards, International Electrotechnical Commission standards.

If you are working in the EU, or are working on CE Marking your product, you should hold the harmonized version of this standard, available through the CEN resellers:

  • EN ISO 13849-1:2008 Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design
  • EN ISO 13849-2:2012 Safety of machinery — Safety-related parts of control systems — Part 2: Validation

Next Installment

Watch for the next part of this series, “Interlock Architectures – Pt. 3: Category 2” where we expand on the first two categories by adding some diagnostic coverage to improve reliability.

Have questions? Email me!

Emergency Stop – What’s so confusing about that?

This entry is part 1 of 11 in the series Emergency Stop

I get a lot of calls and emails asking about emergency stops. This is one of those deceptively simple concepts that has managed to get very complicated over time. Not every machine needs or can benefit from an emergency stop. In some cases, it may lead to an unreasonable expectation of safety from the user, which can lead to injury if they don’t understand the hazards involved. Some product-specific standards

This entry is part 1 of 11 in the series Emergency Stop

I get a lot of calls and emails asking about emergency stops. This is one of those deceptively simple concepts that has managed to get very complicated over time. Not every machine needs or can benefit from an emergency stop. In some cases, it may lead to an unreasonable expectation of safety from the user, which can lead to injury if they don’t understand the hazards involved. Some product-specific standards mandate the requirement for emergency stop, such as CSA Z434-03, where robot controllers are required to provide emergency stop functionality and work cells integrating robots are also required to have emergency stop capability.

Defining Emergency Stop

Old, non-compliant, E-Stop Button
This OLD button is definitely non-compliant.

So what is an Emergency Stop, or e-stop, and when do you need to have one? Let’s look at a few definitions taken from CSA Z432-04:

Emergency situation — an immediately hazardous situation that needs to be ended or averted quickly in order to prevent injury or damage.

Emergency stop — a function that is intended to avert harm or to reduce existing hazards to persons, machinery, or work in progress.

Emergency stop button — a red mushroom-headed button that, when activated, will immediately start the emergency stop sequence.

and one more:

6.2.3.5.3 Complementary protective measures
Following the risk assessment, the measures in this clause either shall be applied to the machine or shall be dealt with in the information for use.

Protective measures that are neither inherently safe design measures, nor safeguarding (implementation of guards and/or protective devices), nor information for use may have to be implemented as required by the intended use and the reasonably foreseeable misuse of the machine. Such measures shall include, but not be limited to,

a) emergency stop;

b) means of rescue of trapped persons; and

c) means of energy isolation and dissipation.

Modern, non-compliant e-stop button.
This more modern button is non-compliant due to the RED background and spring-return button.

So, an e-stop is a system that is intended for use in Emergency conditions to try to limit or avert harm to someone or something. It isn’t a safeguard, but is considered to be a Complementary Protective Measure. In terms of the Hierarchy of Controls, emergency stop systems fall into the same level as Personal Protective Equipment like safety glasses, safety boots and hearing protection. So far so good.

Is an Emergency Stop Required?

Depending on the regulations and the standards you choose to read, machinery is may not be required to have an Emergency Stop. Quoting from CSA Z432-04:

6.2.5.2.1 Components and elements to achieve the emergency stop function
If, following a risk assessment, it is determined that in order to achieve adequate risk reduction under emergency circumstances a machine must be fitted with components and elements necessary to achieve an emergency stop function so that actual or impending emergency situations can be controlled, the following requirements shall apply:

a) The actuators shall be clearly identifiable, clearly visible, and readily accessible.

b) The hazardous process shall be stopped as quickly as possible without creating additional hazards.
If this is not possible or the risk cannot be adequately reduced, this may indicate that an emergency stop function may not be the best solution (i.e., other solutions should be sought). (Bolding added for emphasis – DN)

c) The emergency stop control shall trigger or permit the triggering of certain safeguard movements where necessary.

Later in CSA Z432-04 we find clause 7.17.1.2:

Each operator control station, including pendants, capable of initiating machine motion shall have a manually initiated emergency stop device.

To my knowledge, this is the only general level machinery standard that makes this requirement. Product family standards often make specific requirements, based on the opinion of the Technical Committee responsible for the standard and their knowledge of the specific type of machinery covered by their document.

Note: For more detailed provisions on the electrical design requirements, see NFPA 79 or IEC 60204-1.

Download NFPA standards through ANSI

This more modern button is still wrong due to the RED background.
This more modern button is non-compliant due to the RED background.

If you read Ontario’s Industrial Establishments regulation (Regulation 851), you will find that the only requirement for an emergency stop is that it is properly identified and located “within easy reach” of the operator. What does “properly identified” mean? In Canada, the USA and Internationally, a RED operator device on a YELLOW background, with or without any text behind it, is recognized as EMERGENCY STOP or EMERGENCY OFF, in the case of disconnecting switches or control switches. I’ve scattered some examples of different compliant and non-compliant e-stop devices through this article.

The EU Machinery Directive, 2006/42/EC, and Emergency Stop

Interestingly, the European Union has taken what looks like an opposing view of the need for emergency stop systems. Quoting from Annex I of the Machinery Directive:

1.2.4.3. Emergency stop
Machinery must be fitted with one or more emergency stop devices to enable actual or impending danger to be averted.

Notice the words “…actual or impending danger…” This harmonizes with the definition of Complementary Protective Measures, in that they are intended to allow a user to “avert or limit harm” from a hazard. Clearly, the direction from the European perspective is that ALL machines need to have an emergency stop. Or do they? The same clause goes on to say:

The following exceptions apply:

  • machinery in which an emergency stop device would not lessen the risk, either because it would not reduce the stopping time or because it would not enable the special measures required to deal with the risk to be taken,
  • portable hand-held and/or hand-guided machinery.

From these two bullets it becomes clear that, just as in the Canadian and US regulations, machines only need emergency stops WHEN THEY CAN REDUCE THE RISK. This is hugely important, and often overlooked. If the risks cannot be controlled effectively with an emergency stop, or if the risk would be increased or new risks would be introduced by the action of an e-stop system, then it should not be included in the design.

Carrying on with the same clause:

The device must:

  • have clearly identifiable, clearly visible and quickly accessible control devices,
  • stop the hazardous process as quickly as possible, without creating additional risks,
  • where necessary, trigger or permit the triggering of certain safeguard movements.

Once again, this is consistent with the general requirements found in the Canadian and US regulations. The directive goes on to define the functionality of the system in more detail:

Once active operation of the emergency stop device has ceased following a stop command, that command must be sustained by engagement of the emergency stop device until that engagement is specifically overridden; it must not be possible to engage the device without triggering a stop command; it must be possible to disengage the device only by an appropriate operation, and disengaging the device must not restart the machinery but only permit restarting.

The emergency stop function must be available and operational at all times, regardless of the operating mode.

Emergency stop devices must be a back-up to other safeguarding measures and not a substitute for them.

The first sentence of the first paragraph above is the one that requires e-stop devices to latch in the activated position. The last part of that sentence is even more important: “…disengaging the device must not restart the machinery but only permit restarting.” That phrase requires that every emergency stop system have a second discrete action to reset the emergency stop system. Pulling out the e-stop button and having power come back immediately is not OK. Once that button has been reset, a second action, such as pushing a “POWER ON” or “RESET” button to restore control power is needed. Point of Clarification: I had a question come from a reader asking if combining the e-stop function and the reset function was acceptable. It can be, but only if:

  • The risk assessment for the machinery does not indicate any hazards that might preclude this approach; and
  • The device is designed with the following characteristics:
  • The device must latch in the activated position;
  • The device must have a “neutral” position where the machine’s emergency stop system can be reset, or where the machine can be enabled to run;
  • The reset position must be distinct from the previous two positions, and the device must spring-return to the neutral position.

The second sentence harmonizes with the requirements of the Canadian and US standards.

Finally, the last sentence harmonizes with the idea of “Complementary Protective Measures” as described in CSA Z432.

How Many and Where?

Where? “Within easy reach”. Consider the locations where you EXPECT an operator to be. Besides the main control console, these could include feed hoppers, consumables feeders, finished goods exit points… you get the idea. Anywhere you can reasonably expect an operator to be under normal circumstances is a reasonable place to put an e-stop device. “Easy Reach” I interpret as within the arm-span of an adult (presuming the equipment is not intended for use by children). This translates to 500-600 mm either side of the center line of most work stations.

How do you know if you need an emergency stop? Start with a stop/start analysis. Identify all the normal starting and stopping modes that you anticipate on the equipment. Consider all of the different operating modes that you are providing, such as Automatic, Manual, Teach, Setting, etc. Identify all of the matching stop conditions in the same modes, and ensure that all start functions have a matching stop function.

Do a risk assessment. This is a basic requirement in most jurisdictions today.

As you determine your risk control measures (following the hierarchy of controls), look at what risks you might control with an Emergency Stop. Remember that e-stops fall below safeguards in the hierarchy, so you must use a safeguarding technique if possible, you can’t just default down to an emergency stop. IF the e-stop can provide you with the additional risk reduction then use it, but first reduce the risks in other ways.

The Stop Function and Control Reliability Requirements

Finally, once you determine the need for an emergency stop system, you need to consider the system’s functionality and controls architecture. NFPA 79 is the reference standard for Canada and the USA, and you can find very similar requirements in IEC 60204-1 if you are working in an international market. EN 60204-1 applies in the EU market for industrial machines.

Download NFPA standards through ANSI
Download IEC standards, International Electrotechnical Commission standards.

Functional Stop Categories

NFPA 79 calls out three basic categories of stop. Note that these are NOT reliability categories, but are functional categories. Reliability is not addressed in these sections. Quoting from the standard:

9.2.2 Stop Functions. The three categories of stop functions shall be as follows:

(1) Category 0 is an uncontrolled stop by immediately removing power to the machine actuators.

(2) Category 1 is a controlled stop with power to the machine actuators available to achieve the stop then remove power when the stop is achieved.

(3) Category 2 is a controlled stop with power left available to the machine actuators.

This E-Stop Button is correct.
This E-Stop button is CORRECT. Note the Push-Pull-Twist operator and the YELLOW background.

A bit later, the standards says:

9.2.5.3 Stop.
9.2.5.3.1 Each machine shall be equipped with a Category 0 stop.

9.2.5.3.2 Category 0, Category 1, and/or Category 2 stops shall be provided where indicated by an analysis of the risk assessment and the functional requirements of the machine. Category 0 and Category 1 stops shall be operational regardless of operating modes, and Category 0 shall take priority. Stop function shall operate by de-energizing that relevant circuit and shall override related start functions.

Note that 9.2.5.3.1 does NOT mean that every machine must have an e-stop. It simply says that every machine must have a way to stop the machine that is equivalent to “pulling the plug”. The main disconnect on the control panel can be used for this function if sized and rated appropriately. For cord connected equipment, the plug and socket used to provide power to the equipment can also serve this function. The question of HOW to effect the Category 0 stop depends on WHEN it will be used – i.e. is it being used for a safety related function? What risks must be reduced, or what hazards must be controlled by the stop function?

You’ll also note that that pesky “risk assessment” pops up again in 9.2.5.3.2. You just can’t get away from it…

Control Reliability

Disconnect with E-Stop Colours indicates that this device is intended to be used for EMERGENCY SWITCHING OFF.
Disconnect with E-Stop Colours indicates that this device is intended to be used for EMERGENCY SWITCHING OFF.

Once you know what functional category of stop you need, and what degree of risk reduction you are expecting from the emergency stop system, you can determine the degree of reliability required. In Canada, CSA Z432 gives us these categories: SIMPLE, SINGLE CHANNEL, SINGLE CHANNEL MONITORED and CONTROL RELIABLE. These categories are being replaced slowly by Performance Levels (PL) as defined in ISO 13849-1 2007.

The short answer is that the greater the risk reduction required, the higher the degree of reliability required. In many cases, a SINGLE CHANNEL or SINGLE CHANNEL MONITORED solution may be acceptable, particularly when there are more reliable safeguards in place. On the other hand, you may require CONTROL RELIABLE designs if the e-stop is the primary risk reduction for some risks or specific tasks.

To add to the confusion, ISO 13849-1 appears to exclude complementary protective measures from its scope in Table 8 — Some International Standards applicable to typical machine safety functions and certain of their characteristics. At the very bottom of this table, Complementary Protective Measures are listed, but they appear to be excluded from the standard. I can say that there is nothing wrong with applying the techniques in ISO 13849-1 to the reliability analysis of a complementary protective measure that uses the control system, so do this if it makes sense in your application.

ISO 13849-1:2006 Table 8
ISO 13849-1:2006 Table 8

Extra points go to any reader who noticed that the ‘electrical hazard’ warning label immediately above the disconnect handle in the above photo is a) upside down, and b) using a non-standard lighting flash. Cheap hazard warning labels, like this one, are often as good as none at all. I’ll be writing more on hazard warnings in future posts.

Use of Emergency Stop as part of a Lockout Procedure or HECP.

One last note: Emergency stop systems (with the exception of emergency switching off devices, such as disconnect switches used for e-stop) CANNOT be used for energy isolation in a Hazardous Energy Control Procedure (a.k.a. Lockout). Devices for this purpose must physically separate the energy source from the down-stream components. See CSA Z460 for more on that subject.

Read our Article on Using E-Stops in HECP.

Pneumatic E-Stop Device
Pneumatic E-Stop/Isolation device.

Standards Referenced in this post:

CSA Z432-04, Safeguarding of Machinery

NFPA 79-07, Electrical Standard for Industrial Machinery
Download NFPA standards at ANSI

IEC 60204-1:09,  SAFETY OF MACHINERY – ELECTRICAL EQUIPMENT OF MACHINES – PART 1: GENERAL REQUIREMENTS

Download IEC standards, International Electrotechnical Commission standards.

ISO 13849-1-2006, Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design

See also

ISO 13850:06, SAFETY OF MACHINERY – EMERGENCY STOP – PRINCIPLES FOR DESIGN

Download IEC standards, International Electrotechnical Commission standards.
Download ISO Standards