Interlock Architectures – Pt. 4: Category 3 – Control Reliable

This entry is part 4 of 8 in the series Circuit Architectures Explored

Category 3 sys­tem archi­tec­ture is the first cat­egory that could be con­sidered to have sim­il­ar­ity to “Control Reliable” cir­cuits or sys­tems as defined in the North American stand­ards. It is not the same as Control Reliable, but we’ll get to in a sub­sequent post. If you haven’t read the first three posts in this series, you may want to go back and review them as the con­cepts in those art­icles are the basis for the dis­cus­sion in this post.

So what is “Control Reliable” any­way? This term was coined by the ANSI RIA R15.06 tech­nic­al com­mit­tee when they were devel­op­ing their defin­i­tions for con­trol sys­tem reli­ab­il­ity, first pub­lished in the 1999 edi­tion of the stand­ard. No men­tion of the concept of con­trol reli­ab­il­ity appears in the 1994 edi­tion of CSA Z434 or the pre­ced­ing edi­tion of RIA R15.06.

Essentially, the term “Control Reliable” means that the con­trol sys­tem is designed with some degree of fault tol­er­ance. Depending on the defin­i­tions that you read, this could be single- or multiple-​fault-​tolerance.

There are a num­ber of design tech­niques that can be used to increase the fault tol­er­ance of a con­trol sys­tem. The older approaches, such as those giv­en in ANSI RIA R15.06 – 1999, CSA Z434-​03 or EN 954 – 1:95, rely primar­ily on the struc­ture or archi­tec­ture of the cir­cuit, and the char­ac­ter­ist­ics of the com­pon­ents selec­ted for use. ISO 13849 – 1 uses the same basic archi­tec­tures defined by EN 954 – 1:95, and extends them to include dia­gnost­ic cov­er­age, com­mon cause fail­ure res­ist­ance and an under­stand­ing of the fail­ure rate of the com­pon­ents to determ­ine the degree of fault tol­er­ance and reli­ab­il­ity provided by the design.

OK, enough back­ground for now! Let’s look at the defin­i­tion for Category 3 sys­tems. Remember that “SRP/​CS” means “Safety Related Parts of the Control System”.

Definition

6.2.6 Category 3

For cat­egory 3, the same require­ments as those accord­ing to 6.2.3 for cat­egory B shall apply. “Well-​tried safety prin­ciples” accord­ing to 6.2.4 shall also be fol­lowed. In addi­tion, the fol­low­ing applies. SRP/​CS of cat­egory 3 shall be designed so that a single fault in any of these parts does not lead to the loss of the safety func­tion. Whenever reas­on­ably prac­tic­able, the single fault shall be detec­ted at or before the next demand upon the safety func­tion.

The dia­gnost­ic cov­er­age (DCavg) of the total SRP/​CS includ­ing fault-​detection shall be low. The MTTFd of each of the redund­ant chan­nels shall be low-​to-​high, depend­ing on the PLr. Measures against CCF shall be applied (see Annex F).

NOTE 1 The require­ment of single-​fault detec­tion does not mean that all faults will be detec­ted. Consequently, the accu­mu­la­tion of undetec­ted faults can lead to an unin­ten­ded out­put and a haz­ard­ous situ­ation at the machine. Typical examples of prac­tic­able meas­ures for fault detec­tion are use of the feed­back of mech­an­ic­ally guided relay con­tacts and mon­it­or­ing of redund­ant elec­tric­al out­puts.

NOTE 2 If neces­sary because of tech­no­logy and applic­a­tion, type-​C stand­ard makers need to give fur­ther details on the detec­tion of faults.

NOTE 3 Category 3 sys­tem beha­viour allows that

  • when the single fault occurs the safety func­tion is always per­formed,
  • some but not all faults will be detec­ted,
  • accu­mu­la­tion of undetec­ted faults can lead to the loss of the safety func­tion.

NOTE 4 The tech­no­logy used will influ­ence the pos­sib­il­it­ies for the imple­ment­a­tion of fault detec­tion.

5% Discount on ISO and IEC Standards with code: CC2011 

Breaking it down

Let’s take the defin­i­tion apart and look at the com­pon­ents that make it up.

For cat­egory 3, the same require­ments as those accord­ing to 6.2.3 for cat­egory B shall apply. “Well-​tried safety prin­ciples” accord­ing to 6.2.4 shall also be fol­lowed.

The first couple of lines remind the design­er of two key points:

  • The com­pon­ents selec­ted must be suit­able for the applic­a­tion, i.e. cor­rectly spe­cified for voltage, cur­rent, envir­on­ment­al con­di­tions, etc.; and
  • well-​tried safety prin­ciples” must be used in the design.

It’s import­ant to note here that we are talk­ing about “well tried safety prin­ciples” and NOT “well-​tried com­pon­ents”. The require­ment to use com­pon­ents designed for safety applic­a­tions comes from oth­er stand­ards, like EN 1088 and ISO 13850. The require­ments from these stand­ards, such as the use of “direct-​drive” con­tacts improves the fault tol­er­ance of the com­pon­ent, and so bene­fits the design in the end. These improve­ments are gen­er­ally reflec­ted in the B10d or MTTFd of the com­pon­ent, and are points that inspect­ors will com­monly look for, since they are easy to spot in the field, since “safety-​rated com­pon­ents” often use red or yel­low caps to identi­fy them clearly in the con­trol pan­el.

In addi­tion, the fol­low­ing applies. SRP/​CS of cat­egory 3 shall be designed so that a single fault in any of these parts does not lead to the loss of the safety func­tion.

This sen­tence makes the require­ment for single-​fault tol­er­ance. This means that the fail­ure of any single com­pon­ent in the func­tion­al chan­nel can­not res­ult in the loss of the safety func­tion. To meet this require­ment, redund­ancy is needed. With redund­ant sys­tems, one com­plete chan­nel can fail without los­ing the abil­ity to stop the machinery. It is pos­sible to lose the func­tion of the mon­it­or­ing sys­tem from a single com­pon­ent fail­ure, but as long as the sys­tem con­tin­ues to provide the safety func­tion this may be accept­able. The sys­tem should not per­mit itself to be reset if the mon­it­or­ing sys­tem is not work­ing.

One more “gotcha” from this sen­tence: In order to meet the require­ment that any single com­pon­ent fail­ure can be detec­ted, the design will require two sep­ar­ate sensors to detect the pos­i­tion of a gate, for example. This per­mits the sys­tem to detect a fail­ure in either sensor, includ­ing mech­an­ic­al fail­ures like broken keys or attempts to defeat the safety sys­tem. You can clearly see this in both the block dia­gram, which does not show any mon­it­or­ing con­nec­tion to the input devices, and in the cir­cuit dia­gram. Both of these dia­grams are shown later in this post. The only way out of the require­ment to have redund­ant sensors is to select a gate switch that is robust enough that mech­an­ic­al faults can reas­on­ably be excep­ted. I’ll get into fault excep­tions later in this art­icle.

Whenever reas­on­ably prac­tic­able, the single fault shall be detec­ted at or before the next demand upon the safety func­tion.

This sen­tence can be a bit sticky. The phrase “Whenever reas­on­ably prac­tic­able” means that your design needs to be able to detect single faults unless it would be “unreas­on­able” to do so. What con­sti­tutes an unreas­on­able degree of effort? This is for you to decide. I will say that if there is a com­mon, off the shelf com­pon­ent (COTS) avail­able that will do the job, and you choose not to use it, you will have a dif­fi­cult time con­vin­cing a court that you took every reas­on­ably prac­tic­able means to detect the fault.

Following the comma, the rest of the sen­tence provides the design­er with the basic require­ment for the test sys­tem: it must be able to detect a single com­pon­ent fail­ure at the moment of demand (this is usu­ally how it’s done, since this is typ­ic­ally the simplest way) or before it occurs, which can hap­pen if your test equip­ment has a means to detect a change in some crit­ic­al char­ac­ter­ist­ic of the mon­itored component(s).

 The dia­gnost­ic cov­er­age (DCavg) of the total SRP/​CS includ­ing fault-​detection shall be low.

This sen­tence tells you that your design must meet the require­ments for LOW Diagnostic Coverage. To get to LOW DCavg, we need to look first at Table 6:

ISO 13849 – 1:06 Table 6

Diagnostic Coverage (DC)

Denotation  Range
 None  DC < 60%
 Low  60% <= DC < 90%
 Medium  90% <= DC < 99%
 High  99% <= DC
NOTE 1 For SRP/​CS con­sist­ing of sev­er­al parts an aver­age value DCavg for DC is used in Figure 5, Clause 6 and E.2.

NOTE 2 The choice of the DC ranges is based on the key val­ues 60 %, 90 % and 99 % also estab­lished in oth­er stand­ards (e.g. IEC 61508) deal­ing with dia­gnost­ic cov­er­age of tests. Investigations show that (1 – DC) rather than DC itself is a char­ac­ter­ist­ic meas­ure for the effect­ive­ness of the test. (1 – DC) for the key val­ues 60 %, 90 % and 99 % forms a kind of log­ar­ithmic scale fit­ting to the log­ar­ithmic PL-​scale. A DC-​value less than 60 % has only slight effect on the reli­ab­il­ity of the tested sys­tem and is there­fore called “none”. A DC-​value great­er than 99 % for com­plex sys­tems is very hard to achieve. To be prac­tic­able, the num­ber of ranges was restric­ted to four. The indic­ated bor­ders of this table are assumed with­in an accur­acy of 5 %.

Based on Table 6, the DCavg must be between 60% and 90%, all com­pon­ents con­sidered. To score this, we must go to Annex E and look at Table E1. Using the factors in Table E1, score the design. If you end up in the desired range between 60% and 90% DC cov­er­age, you can move on. If not, the design will require modi­fic­a­tion to bring it into this range.

The MTTFd of each of the redund­ant chan­nels shall be low-​to-​high, depend­ing on the PLr.

This sen­tence reminds you that your com­pon­ent selec­tions mat­ter. Depending on the PLr you are try­ing to achieve, you will need to choose com­pon­ents with suit­able MTTFd rat­ings. Remember that just because you are using a Category 3 archi­tec­ture, you have not auto­mat­ic­ally achieved the highest levels of reli­ab­il­ity. If you refer to Figure 5 in the stand­ard, you can see that a Category 3 archi­tec­ture can meet a range of PL’s, all the way from PLa through PLe!

ISO 13849-1 Figure 5
ISO 13849 – 1 Figure 5

If you want, or need, to know the numer­ic bound­ar­ies of each of the bands in the dia­gram above, look at Annex K of the stand­ard. The full numer­ic rep­res­ent­a­tion of Figure 5 is provided in that Annex.

Measures against CCF shall be applied (see Annex F).

In order for the archi­tec­ture of your design to meet Category 3 archi­tec­ture, CCF meas­ures are required. I’ve dis­cussed Common Cause Failures else­where on the blog, but as a remind­er, a Common Cause Failure is one where a single event, like a light­ning strike on the power line, or a cable being cut, res­ults in the fail­ure of the sys­tem. This is not the same as a Common Mode Failure, where sim­il­ar or dif­fer­ent com­pon­ents fail in the same way. For instance, if both out­put con­tact­ors were to weld closed either sim­ul­tan­eously or at dif­fer­ent time due to over­load­ing because they were under­sized, this could be con­sidered to be a Common Mode Failure. If they both weld closed due to a light­ning strike, that is a Common Cause Failure.

Annex F provides a check­list that is used to score the CCF of the design. The design must meet at least 65 points to be con­sidered to meet the min­im­um level of CCF pro­tec­tion, and more is bet­ter of course! Score your design and see where you come out. Less than 65 and you need to do more. 65 or more and you are good to go.

The Notes

The notes giv­en in the defin­i­tion are also import­ant. Note 1 reminds the design­er that not all faults will be detec­ted, and an accu­mu­la­tion of undetec­ted faults can lead to the loss of the safety func­tion. Be aware that it is up to you as the design­er to min­im­ize the kinds of fail­ures that can accu­mu­late undetec­ted.

Note 2 speaks to the pos­sib­il­ity that a Type-​C product stand­ard, like EN 201 for injec­tion mould­ing machines for example, may impose a min­im­um PLr on the design. Make sure that you get a copy of any Type-​C stand­ard that is rel­ev­ant for your product and mar­ket. Note that the des­ig­na­tion “Type-​C” comes from ISO. If you go look­ing for this ter­min­o­logy in ANSI or CSA stand­ards, you won’t find it used because the concept doesn’t exist in the same way in these National stand­ards.

Note 3 gives you the basic per­form­ance para­met­ers for the design. If your design can do these things, then you’re halfway there.

Finally, Note 4 is a remind­er that dif­fer­ent kinds of tech­no­logy have great­er or less­er cap­ab­il­ity to detect fail­ures. More soph­ist­ic­ated tech­no­logy may be required to achieve the PL level you need.

The Block Diagram

Let’s have a look at the func­tion­al block dia­gram for this Category.

ISO 13849-1 Figure 11By look­ing at the dia­gram you can see clearly the two inde­pend­ent chan­nels and the cross-​monitoring con­nec­tion between the chan­nels. Input devices are not mon­itored, but out­put devices are mon­itored. This is anoth­er sig­ni­fic­ant reas­on requir­ing the use of two phys­ic­ally sep­ar­ate input devices to sense the guard pos­i­tion or whatever oth­er safe­guard­ing device is integ­rated into the sys­tem. The only way that a fail­ure in the input devices can be detec­ted is if one chan­nel changes state and one does not.

If you want to learn more about apply­ing the block dia­gram­ming meth­od to you design, there is a good explan­a­tion of the meth­od in the SISTEMA Cookbook 1, pub­lished by the IFA in Germany. You can down­load the English ver­sion from the link above, or get the doc­u­ment dir­ectly from the IFA web site.

Circuit Diagram

By now you prob­ably get the idea that there are as many ways to con­fig­ure a Category 3 cir­cuit as there are applic­a­tions. Below is a typ­ic­al cir­cuit dia­gram bor­rowed from Rockwell Allen-​Bradley, show­ing the applic­a­tion of typ­ic­al safety relays in a com­plete sys­tem that includes the emer­gency stop sys­tem, a gate inter­lock and a safety mat. You can meet the require­ments for Category 3 archi­tec­ture in oth­er ways, so don’t feel that you must use a COTS safety relay. It just may be the most straight­for­ward way in many cases.

This is not a plug for A-​B products. Neither Machinery Safety 101, nor I, have any rela­tion­ship with Rockwell Allen-​Bradley.

From Rockwell Automation pub­lic­a­tion SAFETY-​WD001A-​EN-​P – June 2011, p.6.

If you’re inter­ested in obtain­ing the source doc­u­ment con­tain­ing this dia­gram, you can down­load it dir­ectly from the Rockwell Automation web site.

Emergency Stop Subsystem

The emer­gency stop cir­cuit uses the 440R-​512R2 relay on the left side of the dia­gram. This par­tic­u­lar sys­tem uses Category 3 archi­tec­ture in the e-​stop sys­tem, which may be more than is required. A risk assess­ment and a start-​stop ana­lys­is is required to determ­ine what per­form­ance level is needed for this sub­sys­tem. Get more inform­a­tion on emer­gency stop.

 Gate Interlock Subsystem

The gate inter­lock cir­cuit is loc­ated in the cen­ter of the dia­gram, and uses the 440R-​D22R2 relay. As you can see, there are two phys­ic­ally sep­ar­ate gate inter­lock switches. Only one con­tact from each switch is used, so one switch is con­nec­ted to Channel 1, and the oth­er to Channel 2. Notice that there is no oth­er mon­it­or­ing of these devices (i.e. no second con­nec­tion to either switch). The sec­ond­ary con­tacts on these switches could be con­nec­ted to the PLC for annun­ci­ation pur­poses. This would allow the PLC to dis­play the open/​closed status of the gate on the machine HMI.

The out­put con­tact­ors, K3 and K4, are mon­itored by the reset loop con­nec­ted to S34 and the +V rail.

One more inter­est­ing point – did you notice that there is a “zone e-​stop” included in the gate inter­lock? If you look imme­di­ately below the cent­ral safety relay and a little to the left you will find an emer­gency stop device. This device is wired in series with the gate inter­lock, so activ­at­ing it will drop out K3 and K4 but not dis­turb the oper­a­tion of the rest of the machine. The safety relay can’t dis­tin­guish between the e-​stop but­ton and the gate inter­locks, so if annun­ci­ation is needed, you may want to use a third con­tact on the e-​stop device to con­nect to a PLC input for this pur­pose.

Safety Mat Subsystem

The safety mat sub­sys­tem is loc­ated on the right side of the dia­gram and uses a second 440R-​D22R2 relay. Safety mats can be either single or dual chan­nel in design. The mat show in this draw­ing is a dual-​channel type. Stepping on the mat causes the con­duct­ive lay­ers in the mat to touch, short­ing Channel 1 to Channel 2. This cre­ates an input fault that will be detec­ted by the 440R relay. The fault con­di­tion will cause the out­put of the relay to open, stop­ping the machine.

Safety mats can be dam­aged reas­on­ably eas­ily, and the cir­cuit design shown will detect shorts or opens with­in the mat and will pre­vent the haz­ard­ous motion from start­ing or con­tinu­ing.

The out­put con­tact­ors, K5 and K6 are mon­itored by the relay reset loop con­nec­ted to S34 and the +V rail.

This cir­cuit also includes a con­ven­tion­al start-​stop cir­cuit that doesn’t rely on the safety relay.

One more thing – just like the gate inter­lock cir­cuit, this cir­cuit also includes a “zone e-​stop”. Look below and to the left of the safety mat relay. As with the gate inter­lock, press­ing this but­ton will drop out K5 and K6, stop­ping the same motions pro­tec­ted by the safety mat. Since the relay can’t tell the dif­fer­ence between the e-​stop but­ton and the mat being activ­ated, you may want to use the same approach and add a third con­tact to the e-​stop but­ton, con­nect­ing it to the PLC for annun­ci­ation.

Component Selection

The com­pon­ents used in the cir­cuit are crit­ic­al to the final PL rat­ing of the design. The final PL of the design depends on the MTTFd of the com­pon­ents used in each chan­nel. No know­ledge of the intern­al con­struc­tion of the safety relays is needed, because the relays come with a PL rat­ing from the man­u­fac­turer. They can be treated as a sub­sys­tem unto them­selves. The selec­tion of the input and out­put devices is then the sig­ni­fic­ant factor. Component data sheets can be down­loaded from the Rockwell site if you want to dig a bit deep­er.

What did you think about this art­icle? What ques­tions came to mind that weren’t answered for you? I look for­ward to hear­ing your thoughts and ques­tions!

Digiprove sealCopyright secured by Digiprove © 2011 – 2014
Acknowledgements: ISO for excerpts from ISO 13849 – 1 and more…
Some Rights Reserved

Missing MTTFd data

Dealing with the huge inform­a­tion void that exists while try­ing to com­plete reas­on­able con­trol reli­ab­il­ity assess­ments is a major chal­lenge for every engin­eer or tech­no­lo­gist tasked with this activ­ity. Here are a few thoughts on what to do now, and where things may be going…

What the heck is MTTFd???

When you first start to work through ISO 13849 – 1, the first thing that will smack you in the head are all the new acronyms. The first one you’ll run into is ‘PL’, of course, since the entire pur­pose of the stand­ard is to aid the design­er in determ­in­ing the reli­ab­il­ity Performance Level of the con­trol sys­tem. Shortly after that you’ll find your­self face to face with MTTFd.

MTTFd, or the Mean Time To Failure (dan­ger­ous), is the name giv­en to the expec­ted fail­ure rate per year for a com­pon­ent used in a sys­tem that is being ana­lyzed. This rate dif­fers from the straight fail­ure rate for the com­pon­ent because it’s lim­ited to the fail­ures that res­ult in a dan­ger­ous fail­ure mode, or that may lead to a haz­ard.

So how do you get this data?

Obtaining MTTFd data for a com­pon­ent should be easy for a design­er. Component man­u­fac­tur­ers who mar­ket com­pon­ents inten­ded for safety applic­a­tions should provide this data in the com­pon­ent spe­cific­a­tions, but there are thou­sands, per­haps mil­lions, of dif­fer­ent com­pon­ents being mar­keted today for use in safety sys­tems. Most of the major man­u­fac­tur­ers are already provid­ing this fig­ure, or a fig­ure that can be used to derive MTTFd, B10d, but for many com­pon­ents, this data is simply not avail­able.

Here are some ran­domly chosen examples of manufacturer’s spe­cific­a­tion sheets that give this data:

Allen-​Bradley Trojan™ T15 Interlock Switch

Pilz PNOZ X2 (pdf data sheet)

Preventa XPS MC Catalog Safety Controller (pdf 2015 Catalog)

B10d is the num­ber of cycles until 10% of the com­pon­ents being tested fail in a dan­ger­ous way. Using fail­ure rate data from the component’s data sheet, it is pos­sible to estim­ate B10d from either B10 or T (the applic­a­tion depend­ent life­time of the com­pon­ent). Check out Annex C of the stand­ard if you want to see how this can be done.

But what do you do if the man­u­fac­turer of your favour­ite con­tact­or doesn’t provide ANY fail­ure data? Some major man­u­fac­tur­ers still don’t provide any fail­ure rate data at all, some provide expec­ted life­times under spe­cif­ic oper­a­tion con­di­tions. Some provide only EN 954 – 1:95 data. In the last case, I think this is one of the reas­ons for the EC Machinery Working Group’s decision late last year to extend the trans­ition peri­od to ISO 13849 – 1:07. Need to know more about that decision?

Now what?

Unless you work for a large organ­iz­a­tion, insti­tut­ing a life test­ing pro­gram is not likely to be an option, since you either need a pro­trac­ted peri­od of time with a few com­pon­ents in test, or thou­sands of samples for a short time.

The stand­ard provides the option to use 10 years as a default where no oth­er data is avail­able. 10 years sounds like a long time at first blush, par­tic­u­larly if the planned life­time of the sys­tem involved is 20 years. Typical MTTFd val­ues for high-​reliability com­pon­ents are in the hun­dreds of years, so by com­par­is­on, 10 years is almost noth­ing. Tables are also provided for some kinds of com­pon­ents, but the tables are neces­sar­ily lim­ited in size, so not every com­pon­ent will be lis­ted.

Your only option is to use the data in the stand­ard, or pick up some of the oth­er pub­lic­a­tions that include com­pon­ent fail­ure data, like MIL-​HDBK-​217F, IEC/​TR 62380 (based on UTE 80810 & RDF 2000), NPRD 95 or IEC 61709 (based on Siemens SN 29500 doc­u­ments). Some of these doc­u­ments may be dif­fi­cult or impossible to obtain.

The res­ult of this lack of object­ive data from the com­pon­ent man­u­fac­tur­ers is:

  • Conservative res­ults based on the min­im­um default MTTFd;
  • Potential over-​design of safety related con­trols;
  • Increased man­u­fac­tur­ing costs for machine build­ers;

The reas­ons for this situ­ation vary by man­u­fac­turer, but ulti­mately it comes down to the cost of life test­ing com­pon­ents mul­ti­plied by num­ber of com­pon­ents built by each man­u­fac­turer. Typical life tests require load sim­u­lat­ors and switch­ing for thou­sands of com­pon­ents, as well as data log­ging to trap fail­ures and record rel­ev­ant data. In the case of flu­id power com­pon­ents (pneu­mat­ics and hydraul­ics), this becomes increas­ingly com­plex. For many com­pon­ent man­u­fac­tur­ers, the cost of the life test­ing is pro­hib­it­ive, even though this data is badly needed by their users.

Will we see an improve­ment in the future? The largest con­trols com­pon­ent man­u­fac­tur­ers are very likely to provide this data as they have it avail­able, mean­ing as they com­plete test­ing. New designs are much more likely to come with this data ini­tially, while it may be a long time before some of the old stand­ard com­pon­ents get time in the life test cell. Until then, lots of com­pon­ents will be assigned ’10 years’.

A big thank you to Wouter Leusden for the idea for this post!

Have a thought to share on this top­ic? Correct an error in the art­icle? Sound off? Leave a com­ment!