Interlock Architectures – Pt. 4: Category 3 – Control Reliable

This entry is part 4 of 8 in the series Circuit Architectures Explored

Category 3 system architecture is the first category that could be considered to have similarity to “Control Reliable” circuits or systems as defined in the North American standards. It is not the same as Control Reliable, but we’ll get to in a subsequent post. If you haven’t read the first three posts in this series, you may want to go back and review them as the concepts in those articles are the basis for the discussion in this post.

So what is “Control Reliable” anyway? This term was coined by the ANSI RIA R15.06 technical committee when they were developing their definitions for control system reliability, first published in the 1999 edition of the standard. No mention of the concept of control reliability appears in the 1994 edition of CSA Z434 or the preceding edition of RIA R15.06.

Essentially, the term “Control Reliable” means that the control system is designed with some degree of fault tolerance. Depending on the definitions that you read, this could be single- or multiple-fault-tolerance.

There are a number of design techniques that can be used to increase the fault tolerance of a control system. The older approaches, such as those given in ANSI RIA R15.06-1999, CSA Z434-03 or EN 954-1:95, rely primarily on the structure or architecture of the circuit, and the characteristics of the components selected for use. ISO 13849-1 uses the same basic architectures defined by EN 954-1:95, and extends them to include diagnostic coverage, common cause failure resistance and an understanding of the failure rate of the components to determine the degree of fault tolerance and reliability provided by the design.

OK, enough background for now! Let’s look at the definition for Category 3 systems. Remember that “SRP/CS” means “Safety Related Parts of the Control System”.

Definition

6.2.6 Category 3

For category 3, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies. SRP/CS of category 3 shall be designed so that a single fault in any of these parts does not lead to the loss of the safety function. Whenever reasonably practicable, the single fault shall be detected at or before the next demand upon the safety function.

The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low. The MTTFd of each of the redundant channels shall be low-to-high, depending on the PLr. Measures against CCF shall be applied (see Annex F).

NOTE 1 The requirement of single-fault detection does not mean that all faults will be detected. Consequently, the accumulation of undetected faults can lead to an unintended output and a hazardous situation at the machine. Typical examples of practicable measures for fault detection are use of the feedback of mechanically guided relay contacts and monitoring of redundant electrical outputs.

NOTE 2 If necessary because of technology and application, type-C standard makers need to give further details on the detection of faults.

NOTE 3 Category 3 system behaviour allows that

  • when the single fault occurs the safety function is always performed,
  • some but not all faults will be detected,
  • accumulation of undetected faults can lead to the loss of the safety function.

NOTE 4 The technology used will influence the possibilities for the implementation of fault detection.

5% Discount on ISO and IEC Standards with code: CC2011

Breaking it down

Let’s take the definition apart and look at the components that make it up.

For category 3, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed.

The first couple of lines remind the designer of two key points:

  • The components selected must be suitable for the application, i.e. correctly specified for voltage, current, environmental conditions, etc.; and
  • “well-tried safety principles” must be used in the design.

It’s important to note here that we are talking about “well tried safety principles” and NOT “well-tried components“. The requirement to use components designed for safety applications comes from other standards, like EN 1088 and ISO 13850. The requirements from these standards, such as the use of “direct-drive” contacts improves the fault tolerance of the component, and so benefits the design in the end. These improvements are generally reflected in the B10d or MTTFd of the component, and are points that inspectors will commonly look for, since they are easy to spot in the field, since “safety-rated components” often use red or yellow caps to identify them clearly in the control panel.

In addition, the following applies. SRP/CS of category 3 shall be designed so that a single fault in any of these parts does not lead to the loss of the safety function.

This sentence makes the requirement for single-fault tolerance. This means that the failure of any single component in the functional channel cannot result in the loss of the safety function. To meet this requirement, redundancy is needed. With redundant systems, one complete channel can fail without losing the ability to stop the machinery. It is possible to lose the function of the monitoring system from a single component failure, but as long as the system continues to provide the safety function this may be acceptable. The system should not permit itself to be reset if the monitoring system is not working.

One more “gotcha” from this sentence: In order to meet the requirement that any single component failure can be detected, the design will require two separate sensors to detect the position of a gate, for example. This permits the system to detect a failure in either sensor, including mechanical failures like broken keys or attempts to defeat the safety system. You can clearly see this in both the block diagram, which does not show any monitoring connection to the input devices, and in the circuit diagram. Both of these diagrams are shown later in this post. The only way out of the requirement to have redundant sensors is to select a gate switch that is robust enough that mechanical faults can reasonably be excepted. I’ll get into fault exceptions later in this article.

Whenever reasonably practicable, the single fault shall be detected at or before the next demand upon the safety function.

This sentence can be a bit sticky. The phrase “Whenever reasonably practicable” means that your design needs to be able to detect single faults unless it would be “unreasonable” to do so. What constitutes an unreasonable degree of effort? This is for you to decide. I will say that if there is a common, off the shelf component (COTS) available that will do the job, and you choose not to use it, you will have a difficult time convincing a court that you took every reasonably practicable means to detect the fault.

Following the comma, the rest of the sentence provides the designer with the basic requirement for the test system: it must be able to detect a single component failure at the moment of demand (this is usually how it’s done, since this is typically the simplest way) or before it occurs, which can happen if your test equipment has a means to detect a change in some critical characteristic of the monitored component(s).

 The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low.

This sentence tells you that your design must meet the requirements for LOW Diagnostic Coverage. To get to LOW DCavg, we need to look first at Table 6:

ISO 13849-1:06 Table 6

Diagnostic Coverage (DC)

Denotation  Range
 None  DC < 60%
 Low  60% <= DC < 90%
 Medium  90% <= DC < 99%
 High  99% <= DC
NOTE 1 For SRP/CS consisting of several parts an average value DCavg for DC is used in Figure 5, Clause 6 and E.2.

NOTE 2 The choice of the DC ranges is based on the key values 60 %, 90 % and 99 % also established in other standards (e.g. IEC 61508) dealing with diagnostic coverage of tests. Investigations show that (1 – DC) rather than DC itself is a characteristic measure for the effectiveness of the test. (1 – DC) for the key values 60 %, 90 % and 99 % forms a kind of logarithmic scale fitting to the logarithmic PL-scale. A DC-value less than 60 % has only slight effect on the reliability of the tested system and is therefore called “none”. A DC-value greater than 99 % for complex systems is very hard to achieve. To be practicable, the number of ranges was restricted to four. The indicated borders of this table are assumed within an accuracy of 5 %.

Based on Table 6, the DCavg must be between 60% and 90%, all components considered. To score this, we must go to Annex E and look at Table E1. Using the factors in Table E1, score the design. If you end up in the desired range between 60% and 90% DC coverage, you can move on. If not, the design will require modification to bring it into this range.

The MTTFd of each of the redundant channels shall be low-to-high, depending on the PLr.

This sentence reminds you that your component selections matter. Depending on the PLr you are trying to achieve, you will need to choose components with suitable MTTFd ratings. Remember that just because you are using a Category 3 architecture, you have not automatically achieved the highest levels of reliability. If you refer to Figure 5 in the standard, you can see that a Category 3 architecture can meet a range of PL’s, all the way from PLa through PLe!

ISO 13849-1 Figure 5
ISO 13849-1 Figure 5

If you want, or need, to know the numeric boundaries of each of the bands in the diagram above, look at Annex K of the standard. The full numeric representation of Figure 5 is provided in that Annex.

Measures against CCF shall be applied (see Annex F).

In order for the architecture of your design to meet Category 3 architecture, CCF measures are required. I’ve discussed Common Cause Failures elsewhere on the blog, but as a reminder, a Common Cause Failure is one where a single event, like a lightning strike on the power line, or a cable being cut, results in the failure of the system. This is not the same as a Common Mode Failure, where similar or different components fail in the same way. For instance, if both output contactors were to weld closed either simultaneously or at different time due to overloading because they were undersized, this could be considered to be a Common Mode Failure. If they both weld closed due to a lightning strike, that is a Common Cause Failure.

Annex F provides a checklist that is used to score the CCF of the design. The design must meet at least 65 points to be considered to meet the minimum level of CCF protection, and more is better of course! Score your design and see where you come out. Less than 65 and you need to do more. 65 or more and you are good to go.

The Notes

The notes given in the definition are also important. Note 1 reminds the designer that not all faults will be detected, and an accumulation of undetected faults can lead to the loss of the safety function. Be aware that it is up to you as the designer to minimize the kinds of failures that can accumulate undetected.

Note 2 speaks to the possibility that a Type-C product standard, like EN 201 for injection moulding machines for example, may impose a minimum PLr on the design. Make sure that you get a copy of any Type-C standard that is relevant for your product and market. Note that the designation “Type-C” comes from ISO. If you go looking for this terminology in ANSI or CSA standards, you won’t find it used because the concept doesn’t exist in the same way in these National standards.

Note 3 gives you the basic performance parameters for the design. If your design can do these things, then you’re halfway there.

Finally, Note 4 is a reminder that different kinds of technology have greater or lesser capability to detect failures. More sophisticated technology may be required to achieve the PL level you need.

The Block Diagram

Let’s have a look at the functional block diagram for this Category.

ISO 13849-1 Figure 11By looking at the diagram you can see clearly the two independent channels and the cross-monitoring connection between the channels. Input devices are not monitored, but output devices are monitored. This is another significant reason requiring the use of two physically separate input devices to sense the guard position or whatever other safeguarding device is integrated into the system. The only way that a failure in the input devices can be detected is if one channel changes state and one does not.

If you want to learn more about applying the block diagramming method to you design, there is a good explanation of the method in the SISTEMA Cookbook 1, published by the IFA in Germany. You can download the English version from the link above, or get the document directly from the IFA web site.

Circuit Diagram

By now you probably get the idea that there are as many ways to configure a Category 3 circuit as there are applications. Below is a typical circuit diagram borrowed from Rockwell Allen-Bradley, showing the application of typical safety relays in a complete system that includes the emergency stop system, a gate interlock and a safety mat. You can meet the requirements for Category 3 architecture in other ways, so don’t feel that you must use a COTS safety relay. It just may be the most straightforward way in many cases.

This is not a plug for A-B products. Neither Machinery Safety 101, nor I, have any relationship with Rockwell Allen-Bradley.

From Rockwell Automation publication SAFETY-WD001A-EN-P – June 2011, p.6.

If you’re interested in obtaining the source document containing this diagram, you can download it directly from the Rockwell Automation web site.

Emergency Stop Subsystem

The emergency stop circuit uses the 440R-512R2 relay on the left side of the diagram. This particular system uses Category 3 architecture in the e-stop system, which may be more than is required. A risk assessment and a start-stop analysis is required to determine what performance level is needed for this subsystem. Get more information on emergency stop.

 Gate Interlock Subsystem

The gate interlock circuit is located in the center of the diagram, and uses the 440R-D22R2 relay. As you can see, there are two physically separate gate interlock switches. Only one contact from each switch is used, so one switch is connected to Channel 1, and the other to Channel 2. Notice that there is no other monitoring of these devices (i.e. no second connection to either switch). The secondary contacts on these switches could be connected to the PLC for annunciation purposes. This would allow the PLC to display the open/closed status of the gate on the machine HMI.

The output contactors, K3 and K4, are monitored by the reset loop connected to S34 and the +V rail.

One more interesting point – did you notice that there is a “zone e-stop” included in the gate interlock? If you look immediately below the central safety relay and a little to the left you will find an emergency stop device. This device is wired in series with the gate interlock, so activating it will drop out K3 and K4 but not disturb the operation of the rest of the machine. The safety relay can’t distinguish between the e-stop button and the gate interlocks, so if annunciation is needed, you may want to use a third contact on the e-stop device to connect to a PLC input for this purpose.

Safety Mat Subsystem

The safety mat subsystem is located on the right side of the diagram and uses a second 440R-D22R2 relay. Safety mats can be either single or dual channel in design. The mat show in this drawing is a dual-channel type. Stepping on the mat causes the conductive layers in the mat to touch, shorting Channel 1 to Channel 2. This creates an input fault that will be detected by the 440R relay. The fault condition will cause the output of the relay to open, stopping the machine.

Safety mats can be damaged reasonably easily, and the circuit design shown will detect shorts or opens within the mat and will prevent the hazardous motion from starting or continuing.

The output contactors, K5 and K6 are monitored by the relay reset loop connected to S34 and the +V rail.

This circuit also includes a conventional start-stop circuit that doesn’t rely on the safety relay.

One more thing – just like the gate interlock circuit, this circuit also includes a “zone e-stop”. Look below and to the left of the safety mat relay. As with the gate interlock, pressing this button will drop out K5 and K6, stopping the same motions protected by the safety mat. Since the relay can’t tell the difference between the e-stop button and the mat being activated, you may want to use the same approach and add a third contact to the e-stop button, connecting it to the PLC for annunciation.

Component Selection

The components used in the circuit are critical to the final PL rating of the design. The final PL of the design depends on the MTTFd of the components used in each channel. No knowledge of the internal construction of the safety relays is needed, because the relays come with a PL rating from the manufacturer. They can be treated as a subsystem unto themselves. The selection of the input and output devices is then the significant factor. Component data sheets can be downloaded from the Rockwell site if you want to dig a bit deeper.

What did you think about this article? What questions came to mind that weren’t answered for you? I look forward to hearing your thoughts and questions!

Digiprove sealCopyright secured by Digiprove © 2011-2014
Acknowledgements: ISO for excerpts from ISO 13849-1 and more...
Some Rights Reserved

Interlock Architectures – Pt. 5: Category 4 — Control Reliable

This entry is part 5 of 8 in the series Circuit Architectures Explored

The most reliable of the five system architectures, Category 4 is the only architecture that uses multiple-fault tolerant techniques to help ensure that component failures do not result in an unacceptable exposure to risk. This post will delve into the depths of this architecture in this installment on system architectures. The definitions and requirements discussed in this article come from ISO 13849-1, Edition 2 (2006) and ISO 13849-2, Edition 1 (2003).

As with preceding articles in this series, I’ll be building on concepts discussed in those articles. If you need more information, you should have a look at the previous articles to see if I’ve answered your questions there.

The Definition

The Category 4 definition builds on both Category B and Category 3. As you read, recall that “SRP/CS” stands for “Safety Related Parts of the Control System”. Here is the complete definition:

6.2.7 Category 4
For category 4, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies.
SRP/CS of category 4 shall be designed such that

  • a single fault in any of these safety-related parts does not lead to a loss of the safety function, and
  • the single fault is detected at or before the next demand upon the safety functions, e.g. immediately, at switch on, or at end of a machine operating cycle, but if this detection is not possible, then an accumulation of undetected faults shall not lead to the loss of the safety function.

The diagnostic coverage (DCavg) of the total SRP/CS shall be high, including the accumulation of faults. The MTTFd of each of the redundant channels shall be high. Measures against CCF shall be applied (see
Annex F).

NOTE 1 Category 4 system behaviour allows that

  • when a single fault occurs the safety function is always performed,
  • the faults will be detected in time to prevent the loss of the safety function,
  • accumulation of undetected faults is taken into account.

NOTE 2 The difference between category 3 and category 4 is a higher DCavg in category 4 and a required MTTFd of each channel of “high” only.

In practice, the consideration of a fault combination of two faults may be sufficient.

5% Discount on ISO and IEC Standards with code: CC2011

Breaking it down

For category 4, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed.

The first two sentences give the basic requirement for all the categories from 2 through 4. Sound component selection based on the application requirements for voltage, current, switching capability and lifetime must be considered. In addition, using well tried safety principles, such as switching the +V rail side of the coil circuit for control components is required. If you aren’t sure about what constitutes a “well-tried safety principle”, see the article on Category 2 where this is discussed. Don’t confuse “well-tried safety principles” with “well-tried components”. There is no requirement in Category 4 for the use of well-tried components, although you can use them for additional reliability if the design requirements warrant.

In addition, the following applies.
SRP/CS of category 4 shall be designed such that

  • a single fault in any of these safety-related parts does not lead to a loss of the safety function, and
  • the single fault is detected at or before the next demand upon the safety functions, e.g. immediately, at switch on, or at end of a machine operating cycle, but if this detection is not possible, then an accumulation of undetected faults shall not lead to the loss of the safety function.

This is the big one. This paragraph, and the two bullets that follow it, define the fundamental performance requirements for this category. No single fault can lead to the loss of the safety function in Category 4, and testing is required that can detect failures and prevent an accumulation of faults that could eventually lead to the loss of the safety function. The second bullet is the one that defines the multiple-fault-tolerance requirement for this category. If you go back to the definition of Category 3, you will see that an accumulation of faults may lead to the loss of the safety function in that Category. This is the key difference between the categories in my opinion.

The diagnostic coverage (DCavg) of the total SRP/CS shall be high, including the accumulation of faults. The MTTFd of each of the redundant channels shall be high. Measures against CCF shall be applied (see
Annex F).

These three sentences give the designer the criteria for diagnostic coverage, channel failure rates and common cause failure protection. As you can see, the ability to diagnose failures automatically is a critical part of the design, as is the use of highly reliable components, leading to highly reliable channels. The strongest CCF protection you can include in the design is also needed, although the “passing score” of 65 remains unchanged (see Annex F in ISO 13849-1 for more details on scoring your design).

NOTE 1 Category 4 system behaviour allows that

  • when a single fault occurs the safety function is always performed,
  • the faults will be detected in time to prevent the loss of the safety function,
  • accumulation of undetected faults is taken into account.

Note 2: …In practice, the consideration of a fault combination of two faults may be sufficient.

Note 1 expands on the first paragraph in the definition, further clarifying the performance requirements by explicit statements. Notice that nowhere is there a requirement that single faults or accumulation of single faults be prevented, only detected by the diagnostic system. Prevention of single faults is nearly impossible, since components do fail. It is important to first understand which components are critical to the safety function, and second, what kinds of faults each component is likely to have, is fundamental to being able to design a diagnostic system that can detect the faults.

The category relies on redundancy to ensure that the complete loss of one channel will not cause the loss of the safety function, but this is only useful if the common cause failures have been properly dealt with. Otherwise, a single event could wipe out both channels simultaneously, causing the loss of the safety function and possibly result in an injury or fatality.

Also notice that multiple single faults are permitted, as long as the accumulation does not result in the loss of the safety function. ISO 13849 allows for “fault exclusion”, a concept that is not used in the North American standards.

The final sentence from Note 2 suggests that consideration of two concurrent faults may be enough, but be careful. You need to look closely at the fault lists to see if there are any groups of high probability faults that are likely to occur concurrently. IF there are, you need to assess these combinations of faults, whether there are 5 or 50 to be evaluated.

Fault Exclusion

Fault exclusion involves assessing the types of faults that can occur in each component in the critical path of the system. The decision to exclude certain kinds of faults is always a technical compromise between the theoretical improbability of the fault, the expertise of the designer(s) and engineers involved and the specific technical requirements of the application. Whenever the decision is made to exclude a particular type of fault, the decision and the process used to make it must be documented in the Reliability Report included in the design file. Section 7.3 of ISO 13849-1 provides guidance on fault exclusion.

In the section discussing Category 1, the standard has this to say about fault exclusion, and the difference between “well-tried components” and “fault exclusion”:

It is important that a clear distinction between “well-tried component” and “fault exclusion” (see Clause 7) be made. The qualification of a component as being well-tried depends on its application. For example, a position switch with positive opening contacts could be considered as being well-tried for a machine tool, while at the same time as being inappropriate for application in a food industry — in the milk industry, for instance, this switch would be destroyed by the milk acid after a few months. A fault exclusion can lead to a very high PL, but the appropriate measures to allow this fault exclusion should be applied during the whole lifetime of the device. In order to ensure this, additional measures outside the control system may be necessary. In the case of a position switch, some examples of these kinds of measures are

  • means to secure the fixing of the switch after its adjustment,
  • means to secure the fixing of the cam,
  • means to ensure the transverse stability of the cam,
  • means to avoid over-travel of the position switch, e.g. adequate mounting strength of the shock absorber and any alignment devices, and
  • means to protect it against damage from outside.

To assist the designer, ISO 13849-2 provides lists of typical faults and the allowable exclusions in Annex D.5. As an example, let’s consider the typical situation where a robust guard interlocking device has been selected. The decision has been made to use redundant electrical circuits to the switching components in the interlock, so electrical faults can be detected. But what about mechanical failures? A fault list is needed:

 Interlock Mechanical Fault List
# Fault Description Result Likelihood
1 Key breaks off Control system cannot determine guard position. Complete failure of system through a single fault. Unlikely
2 Screws mounting key to guard fail Control system cannot determine guard position. Complete failure of system through a single fault. Unlikely
3 Screws mounting interlock device to guard fail Control system cannot determine guard position. Complete failure of system through a single fault. Unlikely
4 Key and interlock device misaligned. Guard cannot close, preventing machine from operating. Very likely
5 Key and interlock device misaligned. Key and / or interlock device damaged. Guard may not close, or the key may jam in the interlock device once closed. Machine is inoperable if the interlock cannot be completed, or the guard cannot be opened if the key jams in the device. Likely
6 Screws mounting key to guard removed by user. Interlock can now be bypassed by fixing the key into the interlocking device. Control system can no longer sense the position of the guard. Likely
7 Screws mounting interlock device to guard removed by user Probably combined with the preceding condition. Control system can no longer sense the position of the guard. Unlikely, but could happen.

There may be more failure modes, but for the purpose of this discussion, lets limit them to this list.

Looking at Fault 1, there are a number of things that could result in a broken key. They include: misalignment of the key and the interlock device, lack of maintenance on the guard and the interlocking hardware, or intentional damage by a user. Unless the hardware is exceptionally robust, including the design of the guard and any alignment features incorporated in the guarding, developing sound rationale for excluding this fault will be very difficult.

Fault 2 considers mechanical failure of the mounting screws for the interlock key. Screws are considered to be well-tried components (see Annex A.5), so you can consider them for fault exclusion. You can improve their reliability by using thread locking adhesives when installing the screws to prevent them from vibrating loose, and “tamper-proof” style screw heads to deter unauthorized removal. Inclusion of these methods will support any decision to exclude these faults. This goes to addressing faults 3, 6 and 7 as well.

Faults 4 & 5 occur frequently and are often caused by poor device selection (i.e. an interlock device intended for straight-line sliding-gate applications is chosen for a hinged gate), or by poor guard design (i.e. the guard is poorly guided by the retention mechanism and can be closed in a misaligned condition). Rationale for prevention of these faults will need to include discussion of design features that will prevent these conditions.

Excluding any other kind of fault follows the same process: Develop the fault list, assess each fault against the relevant Annex from ISO 13849-2, determine if there are preventative measures that can be designed into the product and whether these provide sufficient risk reduction to allow the exclusion of the fault from consideration.

DCavg and MTTFd requirements

NOTE 2 The difference between category 3 and category 4 is a higher DCavg in category 4 and a required MTTFd of each channel of “high” only.

The first sentence in Note 2 clarifies the two main differences from a design standpoint, aside from the additional fault tolerance requirements: Better diagnostics are required and much higher requirements for individual component, and therefore channel, MTTFd.

The Block Diagram

The block diagram for Category 4 is almost identical to Category 3, and was updated by Corrigendum 1 to the diagram shown below. The text from the corrigendum that accompanies the diagram has this to say about the change:

Replace the drawing showing the designated architecture for category 4 with the following drawing. This
corrects the arrowed lines labeled “m” between L1 and O1, and L2 and O2, by changing them from dashed to solid lines, representing higher diagnostic coverage.

I’ve highlighted this area using red ovals on Figure 12 to make it easier to see .

ISO 13849-1 Figure 12 - Category 4 Block Diagram
ISO 13849-1 Figure 12 - Category 4 Block Diagram

Here is Figure 11 for comparison. Notice that the “m” lines are solid in Figure 12 and dashed in Figure 11? Subtle, but significant! There are no other differences between the diagrams.

ISO 13849-1 Figure 11I went looking for a circuit diagram to support the block diagram, but wasn’t able to find one from a commercial source that I could share with you. Considering that the primary differences are in the reliability of the components chosen and in the way the testing is done, this isn’t too surprising. The basic physical construction of the two categories can be virtually identical.

Applications

The following is not from the standards – this is my personal opinion, based on 15 years of practice.

In the past, many manufacturers decided that they were going to apply Category 4 architecture without really understanding the design implications, because they believed that it was “the best”. With the change in the harmonization of EN 954-1 and ISO 13849-1 under the EU machinery directive that comes into force on 29-Dec-2011, and considering the great difficulty that many manufacturers had in properly implementing EN 954-1, I can easily imagine manufacturers who have taken the approach that they already have Category 4 SRP/CS on their systems and making the statement that they now have PLe SRP/CS system performance. This is a bad decision for a lot of reasons:

  1. ISO 13849-1 PLe, Category 4 systems should be reserved for very dangerous machinery where the technical effort and expense involved is warranted by the risk assessment. Attempting to apply this level of design to machinery where a PLb performance level is more suitable based on a risk assessment, is a waste of design time and effort and a needless expense. The product family standards for these types of machines, such as EN 201 for plastic injection moulding machines, or EN 692 for Mechanical Power Presses or EN 693 for Hydraulic Power Presses will explicitly specify the PL level required for these machines.
  2. Manufacturers have frequently claimed EN 954-1 Category 4 performance based on the rating of the safety relay alone, without understanding that the rest of the SRP/CS must be considered, and clearly this is wrong. The SRP/CS must be evaluated as a complete system.

This lack of understanding endangers the users, the maintenance personnel, the owners and the manufacturers. If they continue this approach and an injury occurs, it is my opinion that the courts will have more than enough evidence in the defendant’s published documents to cause some serious legal grief.

As designers involved with the safety of our company’s products or with our co-worker’s safety, I believe that we owe it to everyone who uses our products to be educated and to correctly apply these concepts. The fact that you have read all of the posts leading up to this one is evidence that you are working on getting educated.

Always conduct a risk assessment and use the outcome from that work to guide your selection of safeguarding measures, complementary protective measures and the performance of the SRP/CS that ties those systems together. Choose performance levels that make sense based on the required risk reduction and ensure that the design criteria is met by validating the system once built.

As always, I welcome your comments and questions! Please feel free to comment below. I will respond to all your comments.

Digiprove sealCopyright secured by Digiprove © 2011-2012
Acknowledgements: ISO for excerpts from ISO 13849-1 and more...
Some Rights Reserved

Interlock Architectures Pt. 6 – Comparing North American and International Systems

This entry is part 6 of 8 in the series Circuit Architectures Explored

I’ve now written six posts, including this one, on the topic of circuit architectures for the safety-related parts of control systems. In this post, we’ll compare the International and North American systems. This comparison is not intended to draw conclusions about which is “better”, but rather to compare and contrast the two systems so that designers can clearly see where the overlaps and the gaps in the systems exist.

Since we’ve spent a lot of time talking about ISO 13849-1 [1] in the previous five posts in this series, I think we should begin there by looking at Table 10 from the standard.

Table 10 — Summary of requirements for categories
Category Summary of requirements System behaviour Principle used
to achieve
safety
MTTFd
of each
channel
DCavg CCF
B
(see
6.2.3)
SRP/CS and/or their protective equipment, as well as their components, shall be designed, constructed, selected, assembled and combined in accordance with relevant standards so that they can withstand the expected influence.Basic safety principles shall be used. The occurrence of a fault can lead to the loss of the safety function. Mainly characterized by selection of components Low to medium None Not relevant
1
(see
6.2.4)
Requirements of B shall apply. Well-tried components and well-tried safety principles shall be used. The occurrence of a fault can lead to the loss of the safety function but the probability of occurrence is lower than for category B. Mainly characterized by selection of components High None Not relevant
2
(see
6.2.5)
Requirements of B and the use of well-tried safety principles shall apply. Safety function shall be checked at suitable intervals by the machine control system. The occurrence of a fault can lead to the loss of the safety function between the checks. The loss of safety function is detected by the check. Mainly characterized by structure Low to high Low to medium See Annex F
3
(see
6.2.6)
Requirements of B and the use of well-tried safety principles shall apply.Safety-related parts shall be designed, so that

—a single fault in any of these parts does not lead to the loss of the safety function, and

—whenever reasonably practicable, the single fault is detected.

When a single fault occurs, the safety function is always performed.Some, but not all, faults will be detected.

Accumulation of undetected faults can lead to the loss of the safety function.

 Mainly
characterized
by structure
Low to
high
Low to
medium
 See
Annex F
 4
(see
6.2.7)
Requirements of B and the use of well-tried safety principles shall apply. Safety-related parts shall be designed, so that
—a single fault in any of these parts does not lead to a loss of the safety function, and

—the single fault is detected at or before the next demand upon the safety function, but that if this detection is not possible, an accumulation of undetected faults shall not lead to the loss of the safety function.

 

When a single fault occurs the safety function is always performed. Detection of accumulated faults reduces the probability of the loss of the safety function (high DC). The faults will be detected in time to prevent the loss of the safety function.  Mainly characterized by structure  High  High including accumulation of faults  See Annex F
NOTE For full requirements, see Clause 6.

Table 10 summarizes all the key requirements for the five categories of architecture, giving the fundamental mechanism for achieving safety, the required MTTFd, DC and CCF. Note that fault exclusion can be used in Categories 3 and 4. There is no similar table available for CSA Z432 [2] or RIA R 15.06 [3], so I have constructed one following a similar format to Table 10.

Summary of requirements for CSA Z432 / Z434 and RIA R15.06
CSA Z432-04 / Z434-03 RIA R15.06 1999
Category  Summary of requirements  System behaviour  Principle used
to achieve
safety
Summary of requirements
All Safety control systems (electric, hydraulic, pneumatic) shall meet one of the performance criteria listed in Clauses 4.5.2 to 4.5.5. Safety circuits (electric, hydraulic, pneumatic) shall meet one of the performance criteria listed in 4.5.1 through 4.5.4.2

2 These performance criteria are not to be confused with the European categories B to 3 as described in ISO/IEC DIS 13849-1, Safety of machinery – Safety-related parts of control systems – Part 1: General principles for design (in correlation with EN 954-1.) They are different. The committee believes that the criteria in 4.5.1-4.5.4 exceed the criteria of B – 3 respectively, and further believe the reverse is not true.

SIMPLE Simple safety control systemsshall be designed and constructed using accepted single channel circuitry.Such systems may be programmable.

Note: This type of system should be used for signalling and annunciation purposes only.

The occurrence of a fault can lead to the loss of the safety function. Mainly characterized by component selection. Simple safety circuits shall be designed and constructed using accepted single channel
circuitry, and may be programmable.
SINGLE
CHANNEL
Single channel safety control systems shalla) be hardware based or comply with Clause 6.5;

b) include components that should be safety rated; and

c) be used in accordance with manufacturers’ recommendations and proven circuit designs (e.g., a single channel electromechanical positive break device that signals a stop in a de-energized state).

Note: In this type of system a single component failure can lead to the loss of the safety function.

The occurrence of a fault can lead to the loss of the safety function. Mainly characterized by component selection. Single channel safety circuits shall be hardware based or comply with 6.4, include components
which should be safety rated, be used in compliance with manufacturers’ recommendations
and proven circuit designs (e.g. a single channel electro-mechanical positive break device which signals a stop in a de-energized state.)
SINGLE CHANNEL
WITH
MONITORING
Single channel safety control systems with monitoring shall include the requirements for single channel,
be safety rated, and be checked (preferably automatically) at suitable intervals in accordance with the following:a) The check of the safety function(s) shall be performed

i) at machine start-up; and

ii) periodically during operation (preferably at each change in state).

b) The check shall either

i) allow operation if no faults have been detected; or

ii) generate a stop if a fault is detected. A warning shall be provided if a hazard remains after cessation of motion.

c) The check itself shall not cause a hazardous situation.

d) Following detection of a fault, a safe state shall be maintained until the fault is cleared.

Note: In this type of circuit a single component failure can also lead to the loss of the safety function.

The occurrence of a fault can lead to the loss of the safety function. Characterized by both component selection and structure. Single channel with monitoring safety circuits shall include the requirements for single channel,
shall be safety rated, and shall be checked (preferably automatically) at suitable intervals.a) The check of the safety function(s) shall be performed

1) at machine start-up, and

2) periodically during operation;

b) The check shall either:

1) allow operation if no faults have been detected, or

2) generate a stop signal if a fault is detected.
A warning shall be provided if a hazard remains after cessation of motion;

c) The check itself shall not cause a hazardous situation;

d) Following detection of a fault, a safe state shall be maintained until the fault is cleared.

CONTROL RELIABLE Control reliable safety control systems shall be dual channel with monitoring and shall be designed,
constructed, and applied such that any single component failure, including monitoring, shall not prevent
the stopping action of the robot.
These safety control systems shall be hardware based or in accordance with Clause 6.5. The systems shall include automatic monitoring at the system level conforming to the following:a) The monitoring shall generate a stop if a fault is detected. A warning shall be provided if a hazard remains after cessation of motion.

b) Following detection of a fault, a safe state shall be maintained until the fault is cleared.

c) Common mode failures shall be taken into account when the probability of such a failure occurring is
significant.

d) The single fault should be detected at time of failure. If not practicable, the failure shall be detected
at the next demand upon the safety function.

e) These safety control systems shall be independent of the normal program control (function) and shall be designed to be not easily defeated or not easily bypassed without detection.

When a single fault occurs, the safety function is always performed.Some, but not all, faults will be detected.

Accumulation of undetected faults can lead to the loss of the safety function.

Characterized primarily by structure. Control reliable safety circuitry shall be designed, constructed and applied such that any single component failure shall not prevent the stopping action of the robot.These circuits shall be hardware based or comply with 6.4, and include automatic monitoring at the system level.

a) The monitoring shall generate a stop signal if a fault is detected. A warning shall be provided if a hazard remains after cessation of motion;

b) Following detection of a fault, a safe state shall be maintained until the fault is cleared.

c) Common mode failures shall be taken into account when the probability of such a failure occurring is significant.

d) The single fault should be detected at time of failure. If not practicable, the failure shall be detected at the next demand upon the safety function.

CSA Z434 vs. RIA R15.06

Before we dig into the comparison between North America and the International standards, we need to look at the differences between CSA and ANSI/RIA. There are some subtle differences here that can trip you up and cost significant money to correct after the fact. The following statements are based on my personal experience and on discussions that I have had with people on both the CSA and RIA technical committees tasked with writing these standards. One more note – ANSI RIA R15.06 has been revised and ALL OF SECTION 4 has been replaced with ANSI/RIA/ISO 10218-1 [7]. This is very significant, but we need to deal with this old discussion first.

Systems vs. Circuits

The CSA standard uses the term “control system(s)” throughout the definitions of the categories, while the ANSI/RIA standard uses the term “circuit(s)”. This is really the crux of the discussion between these two standards. While the difference between the terms may seem insignificant at first, you need to understand the background to get the difference.

The CSA term requires two separate sensing devices on the gate or other guard, just as the Category 3 and 4 definitions do, and for the same reason. The CSA committee felt that it was important to be able to detect all single faults, including mechanical ones. Also, the use of two interlocking devices on the guard makes it more difficult to bypass the interlock.

The RIA term requires redundant electrical connections to the interlocking device, but implicitly allows for a single interlocking device because it only explicitly refers to “circuits”.

The explanation I’ve been given for the discrepancy is rooted in the early days of industrial robotics. Many early robot cells had NO interlocks on the guarding because the hazards related to the robot motion was not well understood. There were a number of incidents resulting in fatalities that drove robot users to begin to seek better ways to protect workers. The RIA R15.06 committee decided that interlocks were needed, but there was a recognition that many users would balk at installing expensive interlock devices, so they compromised and allowed that ANY kind of interlocking device was better than none. This was amended in the 1999 edition to require that components be “safety rated”, effectively eliminating the use of conventional proximity switches and non-safety-rated limit switches.

The recent revision of ANSI/RIA R15.06 to include ANSI/ISO 10218-1 as a replacement for Section 4 is significant for a couple of reasons: 1) It now means that the robot itself need only meet the ISO standard; instead of the ISO and the RIA standards; and 2) It brings in ISO 13849-1 definitions of reliability categories. This means that the US has now officially dropped the “SIMPLE, SINGLE-CHANNEL,” etc. definitions and now uses “Category B, 1, etc.” However, they have only adopted the Edition 1 version of the standard, so none of the PL, MTTFd, etc. calculations have been adopted. This means that the RIA standard is now harmonized to the 1995 edition of EN 954-1. These updates to the 2006 edition may come in subsequent editions of R15.06.

CSA has chosen to reaffirm the 2003 edition of CSA Z434, so the Canadian National Standard continues to refer to the old definitions.

North America vs International Standards

In the description of single-channel systems / circuits under the North American standards you will notice that particular attention is paid to including descriptions of the use of “proven designs” and “positive-break devices”. What the TC’s were referring to are the same “well-tried safety principles” and “well-tried components” as referred to in the International standards, only with less description of what those might be. The only major addition to the definitions is the recommendation to use “safety-rated devices”, which is not included in the International standard. (N.B. The use of the word “should” in the definitions should be understood as a strong recommendation, but not necessarily a mandatory requirement.) Under EN 954-1 [4] and EN 1088 [5] (in the referenced editions, in any case) it was possible to use standard limit switches arranged in a redundant manner and activated using combined positive and non-positive-mode activation. In later editions this changed, and there is now a preference for devices intended for use in safety applications.

Also worth noting is that there is NO allowance for fault exclusion under the CSA standard or the 1999 edition of the ANSI standard.

As far as the RIA committee’s assertion that their definitions are not equivalent to the International standard, and may be superior, I think that there are too may missing qualities in the ANSI standard for that to stand. In any case, this is now moot, since ANSI has adopted EN ISO 13849-1:2006 as a reference to EN ISO 10218-1 [6], replacing Section 4 of ANSI/RIA R15.06-1999.

References

[1] “Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design”, ISO 13849-1, Edition 2, International Organization for Standardization (ISO), Geneva, 2006.

[2] “Safeguarding of machinery”, CSA Z432, Canadian Standards Association (CSA), Toronto, 2004.

[3] “American National Standard for Industrial Robots and Robot Systems — Safety Requirements”, ANSI/RIA R15.06, American National Standards Institute, Inc. (ANSI), Ann Arbor, 1999.

[4] “Safety of machinery — Safety related parts of control systems — Part 1. General principles for design”, EN 954-1, European Committee for Standardization (CEN), Geneva, 1996.

[5] “Safety of machinery — Interlocking devices associated with guards — Principles for design and selection”, EN 1088, CEN, Geneva, 1995.

[6] “Robots and robotic devices — Safety requirements for industrial robots — Part 1: Robots”, European Committee for Standardization (CEN), Geneva, 2011.

[7] “Robots for Industrial Environment – Safety Requirements – Part 1 – Robot”, ANSI/RIA/ISO 10218-1, American National Standards Institute, Inc. (ANSI), Ann Arbor, 2007.

Digiprove sealCopyright secured by Digiprove © 2011-2012
Acknowledgements: See references listed at end of article.
Some Rights Reserved