Interlock Architectures — Part 4: Category 3 – Control Reliable

Last updated on August 21st, 2023 at 12:13 pm

This post was updated on 2023-08-21.

Category 3 system architecture is the first category that could be considered to have similarities to “Control Reliable” circuits or systems as defined in some North American standards, now obsolete (notably CSA Z432-04 [2], CSA Z434-03 [3], and ANSI/RIA R15.06‑1999 [4]). ISO 13849-1 [1] Category 3 is NOT the same as Control Reliable, but we’ll discuss that in more detail in a subsequent post. If you haven’t read the first three posts in this series, you may want to go back and review them, as the concepts in those articles are the basis for the discussion in this post.

Note: A reader recently pointed out that I have not defined all of the acronyms used in this post. If you encounter any acronyms or terms that have not been defined, you can find the definitions in the MS101 Glossary.



What is Control Reliable?

So what is “Control Reliable” anyway? This term was coined by the ANSI RIA R15.06 technical committee when they were developing their definitions for control system reliability, first published in the 1999 edition of the standard. “Control reliable” does not appear in the 1994 edition of CSA Z434 or the preceding edition of RIA R15.06.

Essentially, the term “Control Reliable” means that the control system is designed with some degree of fault tolerance. Depending on the specific definition, this could be single- or multiple-fault-tolerance.

Ways to increase fault tolerance

Several design techniques can be used to increase the fault tolerance of a control system. The older approaches, such as those given in [2] and [3] or EN 954‑1:1996 [5], rely primarily on the structure or architecture of the circuit and the characteristics of the components selected for use. ISO 13849-1 uses the same basic architectures defined by EN 954-1. It extends them to include diagnostic coverage, common cause failure resistance and an understanding of the failure rate of the components to determine the degree of fault tolerance and reliability provided by design.

OK, enough background for now! Let’s look at the definition of Category 3 architecture. Remember that “SRP/CS” means “Safety-Related Parts of the Control System.”

Definition

6.2.6 Category 3

For category 3, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies. SRP/CS of category 3 shall be designed so that a single fault in any of these parts does not lead to the loss of the safety function. Whenever reasonably practicable, the single fault shall be detected at or before the next demand upon the safety function.
The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low. The MTTFD of each of the redundant channels shall be low-to-high, depending on the PLr. Measures against CCF shall be applied (see Annex F).
NOTE 1 The requirement of single-fault detection does not mean that all faults will be detected. Consequently, the accumulation of undetected faults can lead to an unintended output and a hazardous situation at the machine. Typical examples of practicable measures for fault detection are use of the feedback of mechanically guided relay contacts and monitoring of redundant electrical outputs.
NOTE 2 If necessary because of technology and application, type-C standard makers need to give further details on the detection of faults.
NOTE 3 Category 3 system behaviour allows that

  • when the single fault occurs the safety function is always performed,
  • some but not all faults will be detected,
  • accumulation of undetected faults can lead to the loss of the safety function.

NOTE 4 The technology used will influence the possibilities for the implementation of fault detection.

[1]

Breaking it down

Let’s take the definition apart and look at the components that make it up.

For category 3, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed.

The first couple of lines remind the designer of two key points:

  • The components selected must be suitable for the application, i.e. correctly specified for voltage, current, environmental conditions, etc.; and
  • “well-tried safety principles” must be used in the design.

It’s important to note that we are talking about “well-tried safety principles” and NOT “well‑tried components.” The requirement to use components intended for use in safety applications comes from other standards, like EN 1088 [6] and ISO 13850 [7]. The requirements from these standards, such as the use of “direct-drive” contacts, improve the fault tolerance of the component and so benefits the design in the end. These improvements are generally reflected in the B10d or MTTFD of the component. They are points that inspectors will commonly look for since they are easy to spot in the field since “safety-rated components” often use red or yellow caps to identify them clearly in the control panel.

In addition, the following applies. SRP/CS of category 3 shall be designed so that a single fault in any of these parts does not lead to the loss of the safety function.

This sentence makes the requirement for single-fault tolerance. This means that the failure of any single component in the functional channel cannot result in the loss of the safety function. To meet this requirement, redundancy is needed. With redundant systems, one complete channel can fail without losing the ability to stop the machinery. It is possible to lose the function of the monitoring system from a single component failure. Still, this may be acceptable as long as the system continues to provide the safety function. The system should not permit itself to be reset if the monitoring system is not working.

One more “gotcha” from this sentence: To meet the requirement that any single component failure can be detected, the design will require two separate sensors to detect the position of a gate, for example. This permits the system to detect a failure in either sensor, including mechanical failures like broken keys or attempts to defeat the safety system. You can see this in the block diagram, which does not show any monitoring connection to the input devices, and in the circuit diagram. Both of these diagrams are shown later in this post. The only way out of the requirement to have redundant sensors is to select a gate switch that is robust enough that mechanical faults can reasonably be excepted. I’ll get into fault exceptions later in this article.

Whenever reasonably practicable, the single fault shall be detected at or before the next demand upon the safety function.

This sentence can be a bit sticky. The phrase “Whenever reasonably practicable” means that your design needs to be able to detect single faults unless it would be “unreasonable” to do so. What constitutes an unreasonable degree of effort? This is for you to decide. I will say that if there is a common, off-the-shelf component (COTS) available that will do the job, and you choose not to use it, you will have difficulty convincing a court that you took every reasonably practicable means to detect the fault.

Following the comma, the rest of the sentence provides the designer with the basic requirement for the test system: it must be able to detect a single component failure at the moment of demand (this is usually how it’s done since this is typically the simplest way) or before it occurs, which can happen if your test equipment has the means to detect a change in some critical characteristic of the monitored component(s).

Diagnostic coverage

The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low.

This sentence tells you that your design must meet the requirements for LOW Diagnostic Coverage. See Table 5:

Based on Table 5, the DCavg must be between 60% and 90%, all components considered. To score this, we must go to Annex E and look at Table E1. Using the factors in Table E1, score the design. You can move on if you end up in the desired range between 60% to 90% DC coverage. If not, the design will require modification to bring it into this range.

Channel MTTFD

The MTTFD of each of the redundant channels shall be low-to-high, depending on the PLr.

This sentence reminds you that your component selections matter. Depending on the PLr you try to achieve, you must choose components with suitable MTTFD ratings. Remember that just because you are using a Category 3 architecture, you have not automatically achieved the highest levels of reliability. Referring to Figure 5 in the standard, you can see that a Category 3 architecture can meet a range of PLs, from PLa through PLe!

Figure 5

ISO 13849-1 Figure 5
ISO 13849-1 Figure 5

If you want or need to know the numeric boundaries of each of the bands in the diagram above, look at Annex K of the standard. The full numeric representation of Figure 5 is provided in that Annex.

Common-Cause Failure Mitigation

Measures against CCF shall be applied (see Annex F).

For the architecture of your design to meet Category 3 architecture, CCF measures are required. I’ve discussed Common Cause Failures elsewhere on the blog, but as a reminder, a Common Cause Failure is one where a single event, like a lightning strike on the power line, or a cable being cut, results in the failure of the system. This is not the same as a Common Mode Failure, where similar or different components fail in the same way. For instance, if both output contactors were to weld closed simultaneously or at a different time due to overloading because they were undersized, this could be considered a Common Mode Failure. If they both weld closed due to a lightning strike, that is a Common Cause Failure.

Annex F provides a checklist that is used to score the CCF of the design. The design must meet at least 65 points to be considered to meet the minimum level of CCF protection, and more is better, of course! Score your design and see where you come out. Less than 65 and you need to do more; 65 or more, and you are good to go.

The Notes

The notes given in the definition are also important. Note 1 reminds the designer that not all faults will be detected, and an accumulation of undetected faults can lead to the loss of the safety function. Be aware that it is up to you as the designer to minimize the kinds of failures that can accumulate undetected.

Note 2 suggests a Type-C product standard, like EN 201 for injection moulding machines, may impose a minimum PLr on the design. Ensure you get a copy of any Type-C standard relevant to your product and market. Note that the designation “Type-C” comes from ISO. If you look for this terminology in ANSI or CSA standards, you won’t find it used because the concept doesn’t exist in the same way in these National standards. (ed. note – CSA Z432-2023, when published, will include a version of the type-A/B/C standard structure information.)

Note 3 gives you the basic performance parameters for the design. If your design can do these things, then you’re halfway there.

Finally, Note 4 is a reminder that different kinds of technology have greater or lesser capability to detect failures. More sophisticated technology may be required to achieve the PL level you need.

The Block Diagram

Let’s have a look at the functional block diagram for this Category.

ISO 13849-1 Figure 11

Looking at the diagram, you can see the two independent channels and the cross-monitoring connection between the channels. Input devices are not monitored, but output devices are monitored. This is another significant reason for requiring two physically separate input devices to sense the guard position. A failure in the input devices can be detected only if one channel changes to state and one does not.

Suppose you want to learn more about applying the block diagramming method to your design. In that case, there is a good explanation of the method in the SISTEMA Cookbook 1, published by the IFA in Germany. You can download the English version from the link above or get the document directly from the IFA website.

Example Circuit Diagram

By now, you probably get the idea that there are as many ways to configure a Category 3 circuit as there are applications. Below is a typical circuit diagram borrowed from Rockwell Allen-Bradley, showing the application of typical safety relays in a complete system that includes the emergency stop system, a gate interlock and a safety mat. You can meet the requirements for Category 3 architecture in other ways, so don’t feel you must use a COTS safety relay. It just may be the most straightforward way in many cases.

This is not a plug for A-B products. Neither Machinery Safety 101 nor I have any relationship with Rockwell Allen-Bradley.

From Rockwell Automation publication SAFETY-WD001A-EN-P ? June 2011, p.6.

If you want to obtain the source document containing this diagram, you can download it directly from the Rockwell Automation website.

Emergency Stop Subsystem

The emergency stop circuit uses the 440R-512R2 relay on the left side of the diagram. This particular system uses Category 3 architecture in the e-stop system, which may be more than is required. A risk assessment and a start-stop analysis are required to determine what performance level is needed for this subsystem. Get more information on emergency stop.

Gate Interlock Subsystem

The gate interlock circuit is located in the center of the diagram and uses the 440R-D22R2 relay. As you can see, there are two physically separate gate interlock switches. Only one contact from each switch is used; one switch is connected to Channel 1, and the other to Channel 2. Notice that there is no other monitoring of these devices (i.e. no second connection to either switch). The secondary contacts on these switches could be connected to the PLC for annunciation purposes. This would allow the PLC to display the open/closed status of the gate on the machine HMI.

The output contactors, K3 and K4, are monitored by the reset loop connected to S34 and the +V rail.

Another interesting point: Did you notice that a “zone e-stop” is included in the gate interlock? You will find an emergency stop device immediately below the central safety relay and a little to the left. This device is wired in series with the gate interlock, so activating it will drop out K3 and K4 but not disturb the operation of the rest of the machine. The safety relay can’t distinguish between the e-stop button and the gate interlocks, so if annunciation is needed, you may want to use the third contact on the e-stop device to connect to a PLC input.

Safety Mat Subsystem

The safety mat subsystem is located on the right side of the diagram and uses a second 440R-D22R2 relay. Safety mats can be either single or dual-channel in design. The mat shown in this drawing is a dual-channel type. Stepping on the mat causes the conductive layers in the mat to touch, shorting Channel 1 to Channel 2. This creates an input fault that the 440R relay will detect. The fault condition will cause the relay output to open, stopping the machine.

Safety mats can be easily damaged, and the circuit design will detect shorts or opens within the mat and prevent the hazardous motion from starting or continuing.

The output contactors, K5 and K6, are monitored by the relay reset loop connected to S34 and the +V rail.

This circuit also includes a conventional start-stop circuit that doesn’t rely on the safety relay.

Like the gate interlock circuit, this circuit also includes a “zone e-stop.” Look below and to the left of the safety mat relay. As with the gate interlock, pressing this button will drop out K5 and K6, stopping the same motions protected by the safety mat. Since the relay can’t tell the difference between the e-stop button and the mat detecting an object, you may want to use the same approach and add a third contact to the e-stop button, connecting it to the PLC for annunciation.

Component Selection

The components used in the circuit are critical to the final PL rating of the design. The final PL of the design depends on the MTTFD of the components used in each channel. No knowledge of the internal construction of the safety relays is needed because the relays come with a PL rating from the manufacturer. They can be treated as a subsystem unto themselves. The selection of the input and output devices is a significant factor. Component data sheets can be downloaded from the Rockwell site if you want to dig deeper.

What did you think about this article? What questions came to mind that were not answered for you? I look forward to hearing your thoughts and questions!


References

[1] Safety of machinery – Safety-related parts of control systems – Part 1: General principles for design, ISO 13849-1. International Organization for Standardization (ISO). 2015. (Note: there is a newer edition of this standard, but you may want to read this post before you buy it.)

[2] Safeguarding of Machinery, CSA Z432. Canadian Standards Association (CSA). 2004.

[3] Industrial Robots and Robot Systems – General Safety Requirements, CSA Z434. Canadian Standards Association (CSA). 2003.

[4] American National Standard for Industrial Robots and Robot Systems — Safety Requirements, ANSI/RIA R15.06. Robotic Industries Association. 1999.

[5] Safety of machinery – Safety-related parts of control systems – Part 1: General principles for design, EN 954‑1. European Committee for Standardization (CEN). 1996.

[6] Safety of Machinery: Interlocking devices associated with guards – principles for design and selection, EN 1088. European Committee for Standardization (CEN). 1996.

[7] Safety of machinery – Emergency stop – Principles for design, ISO 13850. International Organization for Standardization (ISO). 2015.

© 2011 – 2023, Compliance inSight Consulting Inc. Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

3 thoughts on “Interlock Architectures — Part 4: Category 3 – Control Reliable

  1. Not all acronyms or abbreviations defined “PL & MTTF”. Makes it hard to follow.

    1. Hi Richard,
      I know the terminology can be confusing, and I apologize for not fully defining everything.

      The field of functional safety, like most engineering disciplines, is littered with jargon and acronyms. I try to explain the relevant acronyms as they come up, but I don’t always do that for brevity. As I often say in my posts, you cannot use my posts in place of the standards I’m discussing. So, rightly or wrongly, I sometimes make the assumption that readers hold a copy of the standard already, and these terms are defined in the standard. Also, this post is part 4 in a series, and these terms are defined and discussed in earlier posts. I should also mention that the definitions for many of the terms used in my posts can be found in the MS101 Glossary on this site.

      Having said all that:
      PL stands for Performance Level. There are five broad bands of performance, PL=a through PL=e. Each band represents a range of PFHd values, see my next paragraph about that. The five PL bands can also be considered in terms of IEC 61508 or IEC 62061 Safety Integrity Levels (SIL). PLa and PLb are too low for any SIL, and are classified as “OM” for “Other measures.” PLc is approximately SIL1, PLd is approximately SIL2, and PLe is approximately SIL3. IEC has one higher classification, SIL4, for which there is no ISO equivalent. Why that is the case is the subject of another discussion.

      MTTFd, or MTTFD stands for the Mean Time to Dangerous Failure. It’s the inverse of PFHd, the Probability of Dangerous Failure per Hour. If you review Annex K of the standard you’ll find the table that lists all of the PFHd values calculated for combinations of architecture, MTTFD, and DC.

      So, I hope that helps.

    2. I’ve made a few updates to the article and added some links to the MS101 glossary. I hope that helps make reading articles like this a bit easier.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.