Interlock Architectures – Pt. 5: Category 4 — Control Reliable

This entry is part 5 of 8 in the series Circuit Architectures Explored

The most reliable of the five system architectures, Category 4 is the only architecture that uses multiple-fault tolerant techniques to help ensure that component failures do not result in an unacceptable exposure to risk. This post will delve into the depths of this architecture in this installment on system architectures. The definitions and requirements discussed in this article come from ISO 13849-1, Edition 2 (2006) and ISO 13849-2, Edition 1 (2003).

As with preceding articles in this series, I’ll be building on concepts discussed in those articles. If you need more information, you should have a look at the previous articles to see if I’ve answered your questions there.

The Definition

The Category 4 definition builds on both Category B and Category 3. As you read, recall that “SRP/CS” stands for “Safety Related Parts of the Control System”. Here is the complete definition:

6.2.7 Category 4
For category 4, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed. In addition, the following applies.
SRP/CS of category 4 shall be designed such that

  • a single fault in any of these safety-related parts does not lead to a loss of the safety function, and
  • the single fault is detected at or before the next demand upon the safety functions, e.g. immediately, at switch on, or at end of a machine operating cycle, but if this detection is not possible, then an accumulation of undetected faults shall not lead to the loss of the safety function.

The diagnostic coverage (DCavg) of the total SRP/CS shall be high, including the accumulation of faults. The MTTFd of each of the redundant channels shall be high. Measures against CCF shall be applied (see
Annex F).

NOTE 1 Category 4 system behaviour allows that

  • when a single fault occurs the safety function is always performed,
  • the faults will be detected in time to prevent the loss of the safety function,
  • accumulation of undetected faults is taken into account.

NOTE 2 The difference between category 3 and category 4 is a higher DCavg in category 4 and a required MTTFd of each channel of “high” only.

In practice, the consideration of a fault combination of two faults may be sufficient.

5% Discount on ISO and IEC Standards with code: CC2011

Breaking it down

For category 4, the same requirements as those according to 6.2.3 for category B shall apply. “Well-tried safety principles” according to 6.2.4 shall also be followed.

The first two sentences give the basic requirement for all the categories from 2 through 4. Sound component selection based on the application requirements for voltage, current, switching capability and lifetime must be considered. In addition, using well tried safety principles, such as switching the +V rail side of the coil circuit for control components is required. If you aren’t sure about what constitutes a “well-tried safety principle”, see the article on Category 2 where this is discussed. Don’t confuse “well-tried safety principles” with “well-tried components”. There is no requirement in Category 4 for the use of well-tried components, although you can use them for additional reliability if the design requirements warrant.

In addition, the following applies.
SRP/CS of category 4 shall be designed such that

  • a single fault in any of these safety-related parts does not lead to a loss of the safety function, and
  • the single fault is detected at or before the next demand upon the safety functions, e.g. immediately, at switch on, or at end of a machine operating cycle, but if this detection is not possible, then an accumulation of undetected faults shall not lead to the loss of the safety function.

This is the big one. This paragraph, and the two bullets that follow it, define the fundamental performance requirements for this category. No single fault can lead to the loss of the safety function in Category 4, and testing is required that can detect failures and prevent an accumulation of faults that could eventually lead to the loss of the safety function. The second bullet is the one that defines the multiple-fault-tolerance requirement for this category. If you go back to the definition of Category 3, you will see that an accumulation of faults may lead to the loss of the safety function in that Category. This is the key difference between the categories in my opinion.

The diagnostic coverage (DCavg) of the total SRP/CS shall be high, including the accumulation of faults. The MTTFd of each of the redundant channels shall be high. Measures against CCF shall be applied (see
Annex F).

These three sentences give the designer the criteria for diagnostic coverage, channel failure rates and common cause failure protection. As you can see, the ability to diagnose failures automatically is a critical part of the design, as is the use of highly reliable components, leading to highly reliable channels. The strongest CCF protection you can include in the design is also needed, although the “passing score” of 65 remains unchanged (see Annex F in ISO 13849-1 for more details on scoring your design).

NOTE 1 Category 4 system behaviour allows that

  • when a single fault occurs the safety function is always performed,
  • the faults will be detected in time to prevent the loss of the safety function,
  • accumulation of undetected faults is taken into account.

Note 2: …In practice, the consideration of a fault combination of two faults may be sufficient.

Note 1 expands on the first paragraph in the definition, further clarifying the performance requirements by explicit statements. Notice that nowhere is there a requirement that single faults or accumulation of single faults be prevented, only detected by the diagnostic system. Prevention of single faults is nearly impossible, since components do fail. It is important to first understand which components are critical to the safety function, and second, what kinds of faults each component is likely to have, is fundamental to being able to design a diagnostic system that can detect the faults.

The category relies on redundancy to ensure that the complete loss of one channel will not cause the loss of the safety function, but this is only useful if the common cause failures have been properly dealt with. Otherwise, a single event could wipe out both channels simultaneously, causing the loss of the safety function and possibly result in an injury or fatality.

Also notice that multiple single faults are permitted, as long as the accumulation does not result in the loss of the safety function. ISO 13849 allows for “fault exclusion”, a concept that is not used in the North American standards.

The final sentence from Note 2 suggests that consideration of two concurrent faults may be enough, but be careful. You need to look closely at the fault lists to see if there are any groups of high probability faults that are likely to occur concurrently. IF there are, you need to assess these combinations of faults, whether there are 5 or 50 to be evaluated.

Fault Exclusion

Fault exclusion involves assessing the types of faults that can occur in each component in the critical path of the system. The decision to exclude certain kinds of faults is always a technical compromise between the theoretical improbability of the fault, the expertise of the designer(s) and engineers involved and the specific technical requirements of the application. Whenever the decision is made to exclude a particular type of fault, the decision and the process used to make it must be documented in the Reliability Report included in the design file. Section 7.3 of ISO 13849-1 provides guidance on fault exclusion.

In the section discussing Category 1, the standard has this to say about fault exclusion, and the difference between “well-tried components” and “fault exclusion”:

It is important that a clear distinction between “well-tried component” and “fault exclusion” (see Clause 7) be made. The qualification of a component as being well-tried depends on its application. For example, a position switch with positive opening contacts could be considered as being well-tried for a machine tool, while at the same time as being inappropriate for application in a food industry — in the milk industry, for instance, this switch would be destroyed by the milk acid after a few months. A fault exclusion can lead to a very high PL, but the appropriate measures to allow this fault exclusion should be applied during the whole lifetime of the device. In order to ensure this, additional measures outside the control system may be necessary. In the case of a position switch, some examples of these kinds of measures are

  • means to secure the fixing of the switch after its adjustment,
  • means to secure the fixing of the cam,
  • means to ensure the transverse stability of the cam,
  • means to avoid over-travel of the position switch, e.g. adequate mounting strength of the shock absorber and any alignment devices, and
  • means to protect it against damage from outside.

To assist the designer, ISO 13849-2 provides lists of typical faults and the allowable exclusions in Annex D.5. As an example, let’s consider the typical situation where a robust guard interlocking device has been selected. The decision has been made to use redundant electrical circuits to the switching components in the interlock, so electrical faults can be detected. But what about mechanical failures? A fault list is needed:

 Interlock Mechanical Fault List
# Fault Description Result Likelihood
1 Key breaks off Control system cannot determine guard position. Complete failure of system through a single fault. Unlikely
2 Screws mounting key to guard fail Control system cannot determine guard position. Complete failure of system through a single fault. Unlikely
3 Screws mounting interlock device to guard fail Control system cannot determine guard position. Complete failure of system through a single fault. Unlikely
4 Key and interlock device misaligned. Guard cannot close, preventing machine from operating. Very likely
5 Key and interlock device misaligned. Key and / or interlock device damaged. Guard may not close, or the key may jam in the interlock device once closed. Machine is inoperable if the interlock cannot be completed, or the guard cannot be opened if the key jams in the device. Likely
6 Screws mounting key to guard removed by user. Interlock can now be bypassed by fixing the key into the interlocking device. Control system can no longer sense the position of the guard. Likely
7 Screws mounting interlock device to guard removed by user Probably combined with the preceding condition. Control system can no longer sense the position of the guard. Unlikely, but could happen.

There may be more failure modes, but for the purpose of this discussion, lets limit them to this list.

Looking at Fault 1, there are a number of things that could result in a broken key. They include: misalignment of the key and the interlock device, lack of maintenance on the guard and the interlocking hardware, or intentional damage by a user. Unless the hardware is exceptionally robust, including the design of the guard and any alignment features incorporated in the guarding, developing sound rationale for excluding this fault will be very difficult.

Fault 2 considers mechanical failure of the mounting screws for the interlock key. Screws are considered to be well-tried components (see Annex A.5), so you can consider them for fault exclusion. You can improve their reliability by using thread locking adhesives when installing the screws to prevent them from vibrating loose, and “tamper-proof” style screw heads to deter unauthorized removal. Inclusion of these methods will support any decision to exclude these faults. This goes to addressing faults 3, 6 and 7 as well.

Faults 4 & 5 occur frequently and are often caused by poor device selection (i.e. an interlock device intended for straight-line sliding-gate applications is chosen for a hinged gate), or by poor guard design (i.e. the guard is poorly guided by the retention mechanism and can be closed in a misaligned condition). Rationale for prevention of these faults will need to include discussion of design features that will prevent these conditions.

Excluding any other kind of fault follows the same process: Develop the fault list, assess each fault against the relevant Annex from ISO 13849-2, determine if there are preventative measures that can be designed into the product and whether these provide sufficient risk reduction to allow the exclusion of the fault from consideration.

DCavg and MTTFd requirements

NOTE 2 The difference between category 3 and category 4 is a higher DCavg in category 4 and a required MTTFd of each channel of “high” only.

The first sentence in Note 2 clarifies the two main differences from a design standpoint, aside from the additional fault tolerance requirements: Better diagnostics are required and much higher requirements for individual component, and therefore channel, MTTFd.

The Block Diagram

The block diagram for Category 4 is almost identical to Category 3, and was updated by Corrigendum 1 to the diagram shown below. The text from the corrigendum that accompanies the diagram has this to say about the change:

Replace the drawing showing the designated architecture for category 4 with the following drawing. This
corrects the arrowed lines labeled “m” between L1 and O1, and L2 and O2, by changing them from dashed to solid lines, representing higher diagnostic coverage.

I’ve highlighted this area using red ovals on Figure 12 to make it easier to see .

ISO 13849-1 Figure 12 - Category 4 Block Diagram
ISO 13849-1 Figure 12 - Category 4 Block Diagram

Here is Figure 11 for comparison. Notice that the “m” lines are solid in Figure 12 and dashed in Figure 11? Subtle, but significant! There are no other differences between the diagrams.

ISO 13849-1 Figure 11I went looking for a circuit diagram to support the block diagram, but wasn’t able to find one from a commercial source that I could share with you. Considering that the primary differences are in the reliability of the components chosen and in the way the testing is done, this isn’t too surprising. The basic physical construction of the two categories can be virtually identical.

Applications

The following is not from the standards – this is my personal opinion, based on 15 years of practice.

In the past, many manufacturers decided that they were going to apply Category 4 architecture without really understanding the design implications, because they believed that it was “the best”. With the change in the harmonization of EN 954-1 and ISO 13849-1 under the EU machinery directive that comes into force on 29-Dec-2011, and considering the great difficulty that many manufacturers had in properly implementing EN 954-1, I can easily imagine manufacturers who have taken the approach that they already have Category 4 SRP/CS on their systems and making the statement that they now have PLe SRP/CS system performance. This is a bad decision for a lot of reasons:

  1. ISO 13849-1 PLe, Category 4 systems should be reserved for very dangerous machinery where the technical effort and expense involved is warranted by the risk assessment. Attempting to apply this level of design to machinery where a PLb performance level is more suitable based on a risk assessment, is a waste of design time and effort and a needless expense. The product family standards for these types of machines, such as EN 201 for plastic injection moulding machines, or EN 692 for Mechanical Power Presses or EN 693 for Hydraulic Power Presses will explicitly specify the PL level required for these machines.
  2. Manufacturers have frequently claimed EN 954-1 Category 4 performance based on the rating of the safety relay alone, without understanding that the rest of the SRP/CS must be considered, and clearly this is wrong. The SRP/CS must be evaluated as a complete system.

This lack of understanding endangers the users, the maintenance personnel, the owners and the manufacturers. If they continue this approach and an injury occurs, it is my opinion that the courts will have more than enough evidence in the defendant’s published documents to cause some serious legal grief.

As designers involved with the safety of our company’s products or with our co-worker’s safety, I believe that we owe it to everyone who uses our products to be educated and to correctly apply these concepts. The fact that you have read all of the posts leading up to this one is evidence that you are working on getting educated.

Always conduct a risk assessment and use the outcome from that work to guide your selection of safeguarding measures, complementary protective measures and the performance of the SRP/CS that ties those systems together. Choose performance levels that make sense based on the required risk reduction and ensure that the design criteria is met by validating the system once built.

As always, I welcome your comments and questions! Please feel free to comment below. I will respond to all your comments.

Digiprove sealCopyright secured by Digiprove © 2011-2012
Acknowledgements: ISO for excerpts from ISO 13849-1 and more...
Some Rights Reserved
Series NavigationInterlock Architectures – Pt. 4: Category 3 – Control ReliableInterlock Architectures Pt. 6 – Comparing North American and International Systems

Author: Doug Nix

+DougNix is Managing Director and Principal Consultant at Compliance InSight Consulting, Inc. (http://www.complianceinsight.ca) in Kitchener, Ontario, and is Lead Author and Managing Editor of the Machinery Safety 101 blog.

Doug's work includes teaching machinery risk assessment techniques privately and through Conestoga College Institute of Technology and Advanced Learning in Kitchener, Ontario, as well as providing technical services and training programs to clients related to risk assessment, industrial machinery safety, safety-related control system integration and reliability, laser safety and regulatory conformity.

Follow me on Academia.edu//a.academia-assets.com/javascripts/social.js