Interlock Architectures – Pt. 5: Category 4 — Control Reliable

This entry is part 5 of 8 in the series Circuit Architectures Explored

The most reli­able of the five sys­tem archi­tec­tures, Category 4 is the only archi­tec­ture that uses multiple-​fault tol­er­ant tech­niques to help ensure that com­pon­ent fail­ures do not res­ult in an unac­cept­able expos­ure to risk. This post will delve into the depths of this archi­tec­ture in this install­ment on sys­tem archi­tec­tures. The defin­i­tions and require­ments dis­cussed in this art­icle come from ISO 13849 – 1, Edition 2 (2006) and ISO 13849 – 2, Edition 1 (2003).

As with pre­ced­ing art­icles in this series, I’ll be build­ing on con­cepts dis­cussed in those art­icles. If you need more inform­a­tion, you should have a look at the pre­vi­ous art­icles to see if I’ve answered your ques­tions there.

The Definition

The Category 4 defin­i­tion builds on both Category B and Category 3. As you read, recall that “SRP/​CS” stands for “Safety Related Parts of the Control System”. Here is the com­plete defin­i­tion:

6.2.7 Category 4
For cat­egory 4, the same require­ments as those accord­ing to 6.2.3 for cat­egory B shall apply. “Well-​tried safety prin­ciples” accord­ing to 6.2.4 shall also be fol­lowed. In addi­tion, the fol­low­ing applies.
SRP/​CS of cat­egory 4 shall be designed such that

  • a single fault in any of these safety-​related parts does not lead to a loss of the safety func­tion, and
  • the single fault is detec­ted at or before the next demand upon the safety func­tions, e.g. imme­di­ately, at switch on, or at end of a machine oper­at­ing cycle, but if this detec­tion is not pos­sible, then an accu­mu­la­tion of undetec­ted faults shall not lead to the loss of the safety func­tion.

The dia­gnost­ic cov­er­age (DCavg) of the total SRP/​CS shall be high, includ­ing the accu­mu­la­tion of faults. The MTTFd of each of the redund­ant chan­nels shall be high. Measures against CCF shall be applied (see
Annex F).

NOTE 1 Category 4 sys­tem beha­viour allows that

  • when a single fault occurs the safety func­tion is always per­formed,
  • the faults will be detec­ted in time to pre­vent the loss of the safety func­tion,
  • accu­mu­la­tion of undetec­ted faults is taken into account.

NOTE 2 The dif­fer­ence between cat­egory 3 and cat­egory 4 is a high­er DCavg in cat­egory 4 and a required MTTFd of each chan­nel of “high” only.

In prac­tice, the con­sid­er­a­tion of a fault com­bin­a­tion of two faults may be suf­fi­cient.

5% Discount on ISO and IEC Standards with code: CC2011 

Breaking it down

For cat­egory 4, the same require­ments as those accord­ing to 6.2.3 for cat­egory B shall apply. “Well-​tried safety prin­ciples” accord­ing to 6.2.4 shall also be fol­lowed.

The first two sen­tences give the basic require­ment for all the cat­egor­ies from 2 through 4. Sound com­pon­ent selec­tion based on the applic­a­tion require­ments for voltage, cur­rent, switch­ing cap­ab­il­ity and life­time must be con­sidered. In addi­tion, using well tried safety prin­ciples, such as switch­ing the +V rail side of the coil cir­cuit for con­trol com­pon­ents is required. If you aren’t sure about what con­sti­tutes a “well-​tried safety prin­ciple”, see the art­icle on Category 2 where this is dis­cussed. Don’t con­fuse “well-​tried safety prin­ciples” with “well-​tried com­pon­ents”. There is no require­ment in Category 4 for the use of well-​tried com­pon­ents, although you can use them for addi­tion­al reli­ab­il­ity if the design require­ments war­rant.

In addi­tion, the fol­low­ing applies.
SRP/​CS of cat­egory 4 shall be designed such that

  • a single fault in any of these safety-​related parts does not lead to a loss of the safety func­tion, and
  • the single fault is detec­ted at or before the next demand upon the safety func­tions, e.g. imme­di­ately, at switch on, or at end of a machine oper­at­ing cycle, but if this detec­tion is not pos­sible, then an accu­mu­la­tion of undetec­ted faults shall not lead to the loss of the safety func­tion.

This is the big one. This para­graph, and the two bul­lets that fol­low it, define the fun­da­ment­al per­form­ance require­ments for this cat­egory. No single fault can lead to the loss of the safety func­tion in Category 4, and test­ing is required that can detect fail­ures and pre­vent an accu­mu­la­tion of faults that could even­tu­ally lead to the loss of the safety func­tion. The second bul­let is the one that defines the multiple-​fault-​tolerance require­ment for this cat­egory. If you go back to the defin­i­tion of Category 3, you will see that an accu­mu­la­tion of faults may lead to the loss of the safety func­tion in that Category. This is the key dif­fer­ence between the cat­egor­ies in my opin­ion.

The dia­gnost­ic cov­er­age (DCavg) of the total SRP/​CS shall be high, includ­ing the accu­mu­la­tion of faults. The MTTFd of each of the redund­ant chan­nels shall be high. Measures against CCF shall be applied (see
Annex F).

These three sen­tences give the design­er the cri­ter­ia for dia­gnost­ic cov­er­age, chan­nel fail­ure rates and com­mon cause fail­ure pro­tec­tion. As you can see, the abil­ity to dia­gnose fail­ures auto­mat­ic­ally is a crit­ic­al part of the design, as is the use of highly reli­able com­pon­ents, lead­ing to highly reli­able chan­nels. The strongest CCF pro­tec­tion you can include in the design is also needed, although the “passing score” of 65 remains unchanged (see Annex F in ISO 13849 – 1 for more details on scor­ing your design).

NOTE 1 Category 4 sys­tem beha­viour allows that

  • when a single fault occurs the safety func­tion is always per­formed,
  • the faults will be detec­ted in time to pre­vent the loss of the safety func­tion,
  • accu­mu­la­tion of undetec­ted faults is taken into account.

Note 2: …In prac­tice, the con­sid­er­a­tion of a fault com­bin­a­tion of two faults may be suf­fi­cient.

Note 1 expands on the first para­graph in the defin­i­tion, fur­ther cla­ri­fy­ing the per­form­ance require­ments by expli­cit state­ments. Notice that nowhere is there a require­ment that single faults or accu­mu­la­tion of single faults be pre­ven­ted, only detec­ted by the dia­gnost­ic sys­tem. Prevention of single faults is nearly impossible, since com­pon­ents do fail. It is import­ant to first under­stand which com­pon­ents are crit­ic­al to the safety func­tion, and second, what kinds of faults each com­pon­ent is likely to have, is fun­da­ment­al to being able to design a dia­gnost­ic sys­tem that can detect the faults.

The cat­egory relies on redund­ancy to ensure that the com­plete loss of one chan­nel will not cause the loss of the safety func­tion, but this is only use­ful if the com­mon cause fail­ures have been prop­erly dealt with. Otherwise, a single event could wipe out both chan­nels sim­ul­tan­eously, caus­ing the loss of the safety func­tion and pos­sibly res­ult in an injury or fatal­ity.

Also notice that mul­tiple single faults are per­mit­ted, as long as the accu­mu­la­tion does not res­ult in the loss of the safety func­tion. ISO 13849 allows for “fault exclu­sion”, a concept that is not used in the North American stand­ards.

The final sen­tence from Note 2 sug­gests that con­sid­er­a­tion of two con­cur­rent faults may be enough, but be care­ful. You need to look closely at the fault lists to see if there are any groups of high prob­ab­il­ity faults that are likely to occur con­cur­rently. IF there are, you need to assess these com­bin­a­tions of faults, wheth­er there are 5 or 50 to be eval­u­ated.

Fault Exclusion

Fault exclu­sion involves assess­ing the types of faults that can occur in each com­pon­ent in the crit­ic­al path of the sys­tem. The decision to exclude cer­tain kinds of faults is always a tech­nic­al com­prom­ise between the the­or­et­ic­al improb­ab­il­ity of the fault, the expert­ise of the designer(s) and engin­eers involved and the spe­cif­ic tech­nic­al require­ments of the applic­a­tion. Whenever the decision is made to exclude a par­tic­u­lar type of fault, the decision and the pro­cess used to make it must be doc­u­mented in the Reliability Report included in the design file. Section 7.3 of ISO 13849 – 1 provides guid­ance on fault exclu­sion.

In the sec­tion dis­cuss­ing Category 1, the stand­ard has this to say about fault exclu­sion, and the dif­fer­ence between “well-​tried com­pon­ents” and “fault exclu­sion”:

It is import­ant that a clear dis­tinc­tion between “well-​tried com­pon­ent” and “fault exclu­sion” (see Clause 7) be made. The qual­i­fic­a­tion of a com­pon­ent as being well-​tried depends on its applic­a­tion. For example, a pos­i­tion switch with pos­it­ive open­ing con­tacts could be con­sidered as being well-​tried for a machine tool, while at the same time as being inap­pro­pri­ate for applic­a­tion in a food industry — in the milk industry, for instance, this switch would be des­troyed by the milk acid after a few months. A fault exclu­sion can lead to a very high PL, but the appro­pri­ate meas­ures to allow this fault exclu­sion should be applied dur­ing the whole life­time of the device. In order to ensure this, addi­tion­al meas­ures out­side the con­trol sys­tem may be neces­sary. In the case of a pos­i­tion switch, some examples of these kinds of meas­ures are

  • means to secure the fix­ing of the switch after its adjust­ment,
  • means to secure the fix­ing of the cam,
  • means to ensure the trans­verse sta­bil­ity of the cam,
  • means to avoid over-​travel of the pos­i­tion switch, e.g. adequate mount­ing strength of the shock absorber and any align­ment devices, and
  • means to pro­tect it against dam­age from out­side.

To assist the design­er, ISO 13849 – 2 provides lists of typ­ic­al faults and the allow­able exclu­sions in Annex D.5. As an example, let’s con­sider the typ­ic­al situ­ation where a robust guard inter­lock­ing device has been selec­ted. The decision has been made to use redund­ant elec­tric­al cir­cuits to the switch­ing com­pon­ents in the inter­lock, so elec­tric­al faults can be detec­ted. But what about mech­an­ic­al fail­ures? A fault list is needed:

 Interlock Mechanical Fault List
# Fault Description Result Likelihood
1 Key breaks off Control sys­tem can­not determ­ine guard pos­i­tion. Complete fail­ure of sys­tem through a single fault. Unlikely
2 Screws mount­ing key to guard fail Control sys­tem can­not determ­ine guard pos­i­tion. Complete fail­ure of sys­tem through a single fault. Unlikely
3 Screws mount­ing inter­lock device to guard fail Control sys­tem can­not determ­ine guard pos­i­tion. Complete fail­ure of sys­tem through a single fault. Unlikely
4 Key and inter­lock device mis­aligned. Guard can­not close, pre­vent­ing machine from oper­at­ing. Very likely
5 Key and inter­lock device mis­aligned. Key and /​ or inter­lock device dam­aged. Guard may not close, or the key may jam in the inter­lock device once closed. Machine is inop­er­able if the inter­lock can­not be com­pleted, or the guard can­not be opened if the key jams in the device. Likely
6 Screws mount­ing key to guard removed by user. Interlock can now be bypassed by fix­ing the key into the inter­lock­ing device. Control sys­tem can no longer sense the pos­i­tion of the guard. Likely
7 Screws mount­ing inter­lock device to guard removed by user Probably com­bined with the pre­ced­ing con­di­tion. Control sys­tem can no longer sense the pos­i­tion of the guard. Unlikely, but could hap­pen.

There may be more fail­ure modes, but for the pur­pose of this dis­cus­sion, lets lim­it them to this list.

Looking at Fault 1, there are a num­ber of things that could res­ult in a broken key. They include: mis­align­ment of the key and the inter­lock device, lack of main­ten­ance on the guard and the inter­lock­ing hard­ware, or inten­tion­al dam­age by a user. Unless the hard­ware is excep­tion­ally robust, includ­ing the design of the guard and any align­ment fea­tures incor­por­ated in the guard­ing, devel­op­ing sound rationale for exclud­ing this fault will be very dif­fi­cult.

Fault 2 con­siders mech­an­ic­al fail­ure of the mount­ing screws for the inter­lock key. Screws are con­sidered to be well-​tried com­pon­ents (see Annex A.5), so you can con­sider them for fault exclu­sion. You can improve their reli­ab­il­ity by using thread lock­ing adhes­ives when installing the screws to pre­vent them from vibrat­ing loose, and “tamper-​proof” style screw heads to deter unau­thor­ized remov­al. Inclusion of these meth­ods will sup­port any decision to exclude these faults. This goes to address­ing faults 3, 6 and 7 as well.

Faults 4 & 5 occur fre­quently and are often caused by poor device selec­tion (i.e. an inter­lock device inten­ded for straight-​line sliding-​gate applic­a­tions is chosen for a hinged gate), or by poor guard design (i.e. the guard is poorly guided by the reten­tion mech­an­ism and can be closed in a mis­aligned con­di­tion). Rationale for pre­ven­tion of these faults will need to include dis­cus­sion of design fea­tures that will pre­vent these con­di­tions.

Excluding any oth­er kind of fault fol­lows the same pro­cess: Develop the fault list, assess each fault against the rel­ev­ant Annex from ISO 13849 – 2, determ­ine if there are pre­vent­at­ive meas­ures that can be designed into the product and wheth­er these provide suf­fi­cient risk reduc­tion to allow the exclu­sion of the fault from con­sid­er­a­tion.

DCavg and MTTFd requirements

NOTE 2 The dif­fer­ence between cat­egory 3 and cat­egory 4 is a high­er DCavg in cat­egory 4 and a required MTTFd of each chan­nel of “high” only.

The first sen­tence in Note 2 cla­ri­fies the two main dif­fer­ences from a design stand­point, aside from the addi­tion­al fault tol­er­ance require­ments: Better dia­gnostics are required and much high­er require­ments for indi­vidu­al com­pon­ent, and there­fore chan­nel, MTTFd.

The Block Diagram

The block dia­gram for Category 4 is almost identic­al to Category 3, and was updated by Corrigendum 1 to the dia­gram shown below. The text from the cor­ri­gendum that accom­pan­ies the dia­gram has this to say about the change:

Replace the draw­ing show­ing the des­ig­nated archi­tec­ture for cat­egory 4 with the fol­low­ing draw­ing. This
cor­rects the arrowed lines labeled “m” between L1 and O1, and L2 and O2, by chan­ging them from dashed to sol­id lines, rep­res­ent­ing high­er dia­gnost­ic cov­er­age.

I’ve high­lighted this area using red ovals on Figure 12 to make it easi­er to see .

ISO 13849-1 Figure 12 - Category 4 Block Diagram
ISO 13849 – 1 Figure 12 – Category 4 Block Diagram

Here is Figure 11 for com­par­is­on. Notice that the “m” lines are sol­id in Figure 12 and dashed in Figure 11? Subtle, but sig­ni­fic­ant! There are no oth­er dif­fer­ences between the dia­grams.

ISO 13849-1 Figure 11I went look­ing for a cir­cuit dia­gram to sup­port the block dia­gram, but wasn’t able to find one from a com­mer­cial source that I could share with you. Considering that the primary dif­fer­ences are in the reli­ab­il­ity of the com­pon­ents chosen and in the way the test­ing is done, this isn’t too sur­pris­ing. The basic phys­ic­al con­struc­tion of the two cat­egor­ies can be vir­tu­ally identic­al.

Applications

The fol­low­ing is not from the stand­ards – this is my per­son­al opin­ion, based on 15 years of prac­tice.

In the past, many man­u­fac­tur­ers decided that they were going to apply Category 4 archi­tec­ture without really under­stand­ing the design implic­a­tions, because they believed that it was “the best”. With the change in the har­mon­iz­a­tion of EN 954 – 1 and ISO 13849 – 1 under the EU machinery dir­ect­ive that comes into force on 29-​Dec-​2011, and con­sid­er­ing the great dif­fi­culty that many man­u­fac­tur­ers had in prop­erly imple­ment­ing EN 954 – 1, I can eas­ily ima­gine man­u­fac­tur­ers who have taken the approach that they already have Category 4 SRP/​CS on their sys­tems and mak­ing the state­ment that they now have PLe SRP/​CS sys­tem per­form­ance. This is a bad decision for a lot of reas­ons:

  1. ISO 13849 – 1 PLe, Category 4 sys­tems should be reserved for very dan­ger­ous machinery where the tech­nic­al effort and expense involved is war­ran­ted by the risk assess­ment. Attempting to apply this level of design to machinery where a PLb per­form­ance level is more suit­able based on a risk assess­ment, is a waste of design time and effort and a need­less expense. The product fam­ily stand­ards for these types of machines, such as EN 201 for plastic injec­tion mould­ing machines, or EN 692 for Mechanical Power Presses or EN 693 for Hydraulic Power Presses will expli­citly spe­cify the PL level required for these machines.
  2. Manufacturers have fre­quently claimed EN 954 – 1 Category 4 per­form­ance based on the rat­ing of the safety relay alone, without under­stand­ing that the rest of the SRP/​CS must be con­sidered, and clearly this is wrong. The SRP/​CS must be eval­u­ated as a com­plete sys­tem.

This lack of under­stand­ing endangers the users, the main­ten­ance per­son­nel, the own­ers and the man­u­fac­tur­ers. If they con­tin­ue this approach and an injury occurs, it is my opin­ion that the courts will have more than enough evid­ence in the defendant’s pub­lished doc­u­ments to cause some ser­i­ous leg­al grief.

As design­ers involved with the safety of our company’s products or with our co-worker’s safety, I believe that we owe it to every­one who uses our products to be edu­cated and to cor­rectly apply these con­cepts. The fact that you have read all of the posts lead­ing up to this one is evid­ence that you are work­ing on get­ting edu­cated.

Always con­duct a risk assess­ment and use the out­come from that work to guide your selec­tion of safe­guard­ing meas­ures, com­ple­ment­ary pro­tect­ive meas­ures and the per­form­ance of the SRP/​CS that ties those sys­tems togeth­er. Choose per­form­ance levels that make sense based on the required risk reduc­tion and ensure that the design cri­ter­ia is met by val­id­at­ing the sys­tem once built.

As always, I wel­come your com­ments and ques­tions! Please feel free to com­ment below. I will respond to all your com­ments.

Digiprove sealCopyright secured by Digiprove © 2011 – 2012
Acknowledgements: ISO for excerpts from ISO 13849 – 1 and more…
Some Rights Reserved
Series NavigationInterlock Architectures – Pt. 4: Category 3 – Control ReliableInterlock Architectures Pt. 6 – Comparing North American and International Systems

Author: Doug Nix

+DougNix is Managing Director and Principal Consultant at Compliance InSight Consulting, Inc. (http://www.complianceinsight.ca) in Kitchener, Ontario, and is Lead Author and Managing Editor of the Machinery Safety 101 blog.

Doug's work includes teaching machinery risk assessment techniques privately and through Conestoga College Institute of Technology and Advanced Learning in Kitchener, Ontario, as well as providing technical services and training programs to clients related to risk assessment, industrial machinery safety, safety-related control system integration and reliability, laser safety and regulatory conformity.

Follow me on Academia.edu//a.academia-assets.com/javascripts/social.js