Category 3 system architecture is the first category that could be considered to have similarity to “Control Reliable” circuits or systems as defined in the North American standards. It is not the same as Control Reliable, but we’ll get to in a subsequent post. If you haven’t read the first three posts in this series, you may want to go back and review them as the concepts in those articles are the basis for the discussion in this post.
So what is “Control Reliable” anyway? This term was coined by the ANSI RIA R15.06 technical committee when they were developing their definitions for control system reliability, first published in the 1999 edition of the standard. No mention of the concept of control reliability appears in the 1994 edition of CSA Z434 or the preceding edition of RIA R15.06.
Essentially, the term “Control Reliable” means that the control system is designed with some degree of fault tolerance. Depending on the definitions that you read, this could be single- or multiple-fault-tolerance.
There are a number of design techniques that can be used to increase the fault tolerance of a control system. The older approaches, such as those given in ANSI RIA R15.06–1999, CSA Z434-03 or EN 954–1:95, rely primarily on the structure or architecture of the circuit, and the characteristics of the components selected for use. ISO 13849–1 uses the same basic architectures defined by EN 954–1:95, and extends them to include diagnostic coverage, common cause failure resistance and an understanding of the failure rate of the components to determine the degree of fault tolerance and reliability provided by the design.
OK, enough background for now! Let’s look at the definition for Category 3 systems. Remember that “SRP/CS” means “Safety Related Parts of the Control System”.
6.2.6 Category 3
For category 3, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies. SRP/CS of category 3 shall be designed so that a single fault in any of these parts does not lead to the loss of the safety function. Whenever reasonably practicable, the single fault shall be detected at or before the next demand upon the safety function.
The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low. The MTTFd of each of the redundant channels shall be low-to-high, depending on the PLr. Measures against CCF shall be applied (see Annex F).
NOTE 1 The requirement of single-fault detection does not mean that all faults will be detected. Consequently, the accumulation of undetected faults can lead to an unintended output and a hazardous situation at the machine. Typical examples of practicable measures for fault detection are use of the feedback of mechanically guided relay contacts and monitoring of redundant electrical outputs.
NOTE 2 If necessary because of technology and application, type-C standard makers need to give further details on the detection of faults.
NOTE 3 Category 3 system behaviour allows that
- when the single fault occurs the safety function is always performed,
- some but not all faults will be detected,
- accumulation of undetected faults can lead to the loss of the safety function.
NOTE 4 The technology used will influence the possibilities for the implementation of fault detection.
Breaking it down
Let’s take the definition apart and look at the components that make it up.
For category 3, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed.
The first couple of lines remind the designer of two key points:
- The components selected must be suitable for the application, i.e. correctly specified for voltage, current, environmental conditions, etc.; and
- “well-tried safety principles” must be used in the design.
It’s important to note here that we are talking about “well tried safety principles” and NOT “well-tried components”. The requirement to use components designed for safety applications comes from other standards, like EN 1088 and ISO 13850. The requirements from these standards, such as the use of “direct-drive” contacts improves the fault tolerance of the component, and so benefits the design in the end. These improvements are generally reflected in the B10d or MTTFd of the component, and are points that inspectors will commonly look for, since they are easy to spot in the field, since “safety-rated components” often use red or yellow caps to identify them clearly in the control panel.
In addition, the following applies. SRP/CS of category 3 shall be designed so that a single fault in any of these parts does not lead to the loss of the safety function.
This sentence makes the requirement for single-fault tolerance. This means that the failure of any single component in the functional channel cannot result in the loss of the safety function. To meet this requirement, redundancy is needed. With redundant systems, one complete channel can fail without losing the ability to stop the machinery. It is possible to lose the function of the monitoring system from a single component failure, but as long as the system continues to provide the safety function this may be acceptable. The system should not permit itself to be reset if the monitoring system is not working.
One more “gotcha” from this sentence: In order to meet the requirement that any single component failure can be detected, the design will require two separate sensors to detect the position of a gate, for example. This permits the system to detect a failure in either sensor, including mechanical failures like broken keys or attempts to defeat the safety system. You can clearly see this in both the block diagram, which does not show any monitoring connection to the input devices, and in the circuit diagram. Both of these diagrams are shown later in this post. The only way out of the requirement to have redundant sensors is to select a gate switch that is robust enough that mechanical faults can reasonably be excepted. I’ll get into fault exceptions later in this article.
Whenever reasonably practicable, the single fault shall be detected at or before the next demand upon the safety function.
This sentence can be a bit sticky. The phrase “Whenever reasonably practicable” means that your design needs to be able to detect single faults unless it would be “unreasonable” to do so. What constitutes an unreasonable degree of effort? This is for you to decide. I will say that if there is a common, off the shelf component (COTS) available that will do the job, and you choose not to use it, you will have a difficult time convincing a court that you took every reasonably practicable means to detect the fault.
Following the comma, the rest of the sentence provides the designer with the basic requirement for the test system: it must be able to detect a single component failure at the moment of demand (this is usually how it’s done, since this is typically the simplest way) or before it occurs, which can happen if your test equipment has a means to detect a change in some critical characteristic of the monitored component(s).
The diagnostic coverage (DCavg) of the total SRP/CS including fault-detection shall be low.
This sentence tells you that your design must meet the requirements for LOW Diagnostic Coverage. To get to LOW DCavg, we need to look first at Table 6:
Diagnostic Coverage (DC)
|None||DC < 60%|
|Low||60% <= DC < 90%|
|Medium||90% <= DC < 99%|
|High||99% <= DC|
|NOTE 1 For SRP/CS consisting of several parts an average value DCavg for DC is used in Figure 5, Clause 6 and E.2.
NOTE 2 The choice of the DC ranges is based on the key values 60 %, 90 % and 99 % also established in other standards (e.g. IEC 61508) dealing with diagnostic coverage of tests. Investigations show that (1 — DC) rather than DC itself is a characteristic measure for the effectiveness of the test. (1 — DC) for the key values 60 %, 90 % and 99 % forms a kind of logarithmic scale fitting to the logarithmic PL-scale. A DC-value less than 60 % has only slight effect on the reliability of the tested system and is therefore called “none”. A DC-value greater than 99 % for complex systems is very hard to achieve. To be practicable, the number of ranges was restricted to four. The indicated borders of this table are assumed within an accuracy of 5 %.
Based on Table 6, the DCavg must be between 60% and 90%, all components considered. To score this, we must go to Annex E and look at Table E1. Using the factors in Table E1, score the design. If you end up in the desired range between 60% and 90% DC coverage, you can move on. If not, the design will require modification to bring it into this range.
The MTTFd of each of the redundant channels shall be low-to-high, depending on the PLr.
This sentence reminds you that your component selections matter. Depending on the PLr you are trying to achieve, you will need to choose components with suitable MTTFd ratings. Remember that just because you are using a Category 3 architecture, you have not automatically achieved the highest levels of reliability. If you refer to Figure 5 in the standard, you can see that a Category 3 architecture can meet a range of PL’s, all the way from PLa through PLe!
If you want, or need, to know the numeric boundaries of each of the bands in the diagram above, look at Annex K of the standard. The full numeric representation of Figure 5 is provided in that Annex.
Measures against CCF shall be applied (see Annex F).
In order for the architecture of your design to meet Category 3 architecture, CCF measures are required. I’ve discussed Common Cause Failures elsewhere on the blog, but as a reminder, a Common Cause Failure is one where a single event, like a lightning strike on the power line, or a cable being cut, results in the failure of the system. This is not the same as a Common Mode Failure, where similar or different components fail in the same way. For instance, if both output contactors were to weld closed either simultaneously or at different time due to overloading because they were undersized, this could be considered to be a Common Mode Failure. If they both weld closed due to a lightning strike, that is a Common Cause Failure.
Annex F provides a checklist that is used to score the CCF of the design. The design must meet at least 65 points to be considered to meet the minimum level of CCF protection, and more is better of course! Score your design and see where you come out. Less than 65 and you need to do more. 65 or more and you are good to go.
The notes given in the definition are also important. Note 1 reminds the designer that not all faults will be detected, and an accumulation of undetected faults can lead to the loss of the safety function. Be aware that it is up to you as the designer to minimize the kinds of failures that can accumulate undetected.
Note 2 speaks to the possibility that a Type-C product standard, like EN 201 for injection moulding machines for example, may impose a minimum PLr on the design. Make sure that you get a copy of any Type-C standard that is relevant for your product and market. Note that the designation “Type-C” comes from ISO. If you go looking for this terminology in ANSI or CSA standards, you won’t find it used because the concept doesn’t exist in the same way in these National standards.
Note 3 gives you the basic performance parameters for the design. If your design can do these things, then you’re halfway there.
Finally, Note 4 is a reminder that different kinds of technology have greater or lesser capability to detect failures. More sophisticated technology may be required to achieve the PL level you need.
The Block Diagram
Let’s have a look at the functional block diagram for this Category.
By looking at the diagram you can see clearly the two independent channels and the cross-monitoring connection between the channels. Input devices are not monitored, but output devices are monitored. This is another significant reason requiring the use of two physically separate input devices to sense the guard position or whatever other safeguarding device is integrated into the system. The only way that a failure in the input devices can be detected is if one channel changes state and one does not.
If you want to learn more about applying the block diagramming method to you design, there is a good explanation of the method in the SISTEMA Cookbook 1, published by the IFA in Germany. You can download the English version from the link above, or get the document directly from the IFA web site.
By now you probably get the idea that there are as many ways to configure a Category 3 circuit as there are applications. Below is a typical circuit diagram borrowed from Rockwell Allen-Bradley, showing the application of typical safety relays in a complete system that includes the emergency stop system, a gate interlock and a safety mat. You can meet the requirements for Category 3 architecture in other ways, so don’t feel that you must use a COTS safety relay. It just may be the most straightforward way in many cases.
This is not a plug for A-B products. Neither Machinery Safety 101, nor I, have any relationship with Rockwell Allen-Bradley.
If you’re interested in obtaining the source document containing this diagram, you can download it directly from the Rockwell Automation web site.
Emergency Stop Subsystem
The emergency stop circuit uses the 440R-512R2 relay on the left side of the diagram. This particular system uses Category 3 architecture in the e-stop system, which may be more than is required. A risk assessment and a start-stop analysis is required to determine what performance level is needed for this subsystem. Get more information on emergency stop.
Gate Interlock Subsystem
The gate interlock circuit is located in the center of the diagram, and uses the 440R-D22R2 relay. As you can see, there are two physically separate gate interlock switches. Only one contact from each switch is used, so one switch is connected to Channel 1, and the other to Channel 2. Notice that there is no other monitoring of these devices (i.e. no second connection to either switch). The secondary contacts on these switches could be connected to the PLC for annunciation purposes. This would allow the PLC to display the open/closed status of the gate on the machine HMI.
The output contactors, K3 and K4, are monitored by the reset loop connected to S34 and the +V rail.
One more interesting point — did you notice that there is a “zone e-stop” included in the gate interlock? If you look immediately below the central safety relay and a little to the left you will find an emergency stop device. This device is wired in series with the gate interlock, so activating it will drop out K3 and K4 but not disturb the operation of the rest of the machine. The safety relay can’t distinguish between the e-stop button and the gate interlocks, so if annunciation is needed, you may want to use a third contact on the e-stop device to connect to a PLC input for this purpose.
Safety Mat Subsystem
The safety mat subsystem is located on the right side of the diagram and uses a second 440R-D22R2 relay. Safety mats can be either single or dual channel in design. The mat show in this drawing is a dual-channel type. Stepping on the mat causes the conductive layers in the mat to touch, shorting Channel 1 to Channel 2. This creates an input fault that will be detected by the 440R relay. The fault condition will cause the output of the relay to open, stopping the machine.
Safety mats can be damaged reasonably easily, and the circuit design shown will detect shorts or opens within the mat and will prevent the hazardous motion from starting or continuing.
The output contactors, K5 and K6 are monitored by the relay reset loop connected to S34 and the +V rail.
This circuit also includes a conventional start-stop circuit that doesn’t rely on the safety relay.
One more thing — just like the gate interlock circuit, this circuit also includes a “zone e-stop”. Look below and to the left of the safety mat relay. As with the gate interlock, pressing this button will drop out K5 and K6, stopping the same motions protected by the safety mat. Since the relay can’t tell the difference between the e-stop button and the mat being activated, you may want to use the same approach and add a third contact to the e-stop button, connecting it to the PLC for annunciation.
The components used in the circuit are critical to the final PL rating of the design. The final PL of the design depends on the MTTFd of the components used in each channel. No knowledge of the internal construction of the safety relays is needed, because the relays come with a PL rating from the manufacturer. They can be treated as a subsystem unto themselves. The selection of the input and output devices is then the significant factor. Component data sheets can be downloaded from the Rockwell site if you want to dig a bit deeper.
What did you think about this article? What questions came to mind that weren’t answered for you? I look forward to hearing your thoughts and questions!