ISO 13849-1 Analysis — Part 6: CCF — Common Cause Failures

This entry is part 6 of 6 in the series How to do a 13849-1 analysis

What is a Common Cause Failure?

There are two similar-sounding terms that people often get confused: Common Cause Failure (CCF) and Common Mode Failure. While these two types of failures sound similar, they are different. A Common Cause Failure is a failure in a system where two or more portions of the system fail at the same time from a single common cause. An example could be a lightning strike that causes a contactor to weld and simultaneously takes out the safety relay processor that controls the contactor. Common cause failures are therefore two different manners of failure in two different components, but with a single cause.

Common Mode Failure is where two components or portions of a system fail in the same way, at the same time. For example, two interposing relays both fail with welded contacts at the same time. The failures could be caused by the same cause or from different causes, but the way the components fail is the same.

Common-cause failure includes common mode failure, since a common cause can result in a common manner of failure in identical devices used in a system.

Here are the formal definitions of these terms:

3.1.6 common cause failure CCF

failures of different items, resulting from a single event, where these failures are not consequences of each other

Note 1 to entry: Common cause failures should not be confused with common mode failures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04-23.] [1]

 

3.36 common mode failures

failures of items characterized by the same fault mode

NOTE Common mode failures should not be confused with common cause failures, as the common mode failures can result from different causes. [lEV 191-04-24] [3]

The “common mode” failure definition uses the phrase “fault mode”, so let’s look at that as well:

failure mode
DEPRECATED: fault mode
manner in which failure occurs

Note 1 to entry: A failure mode may be defined by the function lost or other state transition that occurred. [IEV 192-03-17] [17]

As you can see, “fault mode” is no longer used, in favour of the more common “failure mode”, so it is possible to re-write the common-mode failure definition to read, “failures of items characterised by the same manner of failure.”

Random, Systematic and Common Cause Failures

Why do we need to care about this? There are three manners in which failures occur: random failures, systematic failures, and common cause failures. When developing safety related controls, we need to consider all three and mitigate them as much as possible.

Random failures do not follow any pattern, occurring randomly over time, and are often brought on by over-stressing the component, or from manufacturing flaws. Random failures can increase due to environmental or process-related stresses, like corrosion, EMI, normal wear-and-tear, or other over-stressing of the component or subsystem. Random failures are often mitigated through selection of high-reliability components [18].

Systematic failures include common-cause failures, and occur because some human behaviour occurred that was not caught by procedural means. These failures are due to design, specification, operating, maintenance, and installation errors. When we look at systematic errors, we are looking for things like training of the system designers, or quality assurance procedures used to validate the way the system operates. Systematic failures are non-random and complex, making them difficult to analyse statistically. Systematic errors are a significant source of common-cause failures because they can affect redundant devices, and because they are often deterministic, occurring whenever a set of circumstances exist.

Systematic failures include many types of errors, such as:

  • Manufacturing defects, e.g., software and hardware errors built into the device by the manufacturer.
  • Specification mistakes, e.g. incorrect design basis and inaccurate software specification.
  • Implementation errors, e.g., improper installation, incorrect programming, interface problems, and not following the safety manual for the devices used to realise the safety function.
  • Operation and maintenance, e.g., poor inspection, incomplete testing and improper bypassing [18].

Diverse redundancy is commonly used to mitigate systematic failures, since differences in component or subsystem design tend to create non-overlapping systematic failures, reducing the likelihood of a common error creating a common-mode failure. Errors in specification, implementation, operation and maintenance are not affected by diversity.

Fig 1 below shows the results of a small study done by the UK’s Health and Safety Executive in 1994 [19] that supports the idea that systematic failures are a significant contributor to safety system failures. The study included only 34 systems (n=34), so the results cannot be considered conclusive. However, there were some startling results. As you can see, errors in the specification of the safety functions (Safety Requirement Specification) resulted in about 44% of the system failures in the study. Based on this small sample, systematic failures appear to be a significate source of failures.

Pie chart illustrating the proportion of failures in each phase of the life cycle of a machine, based on data taken from HSE Report HSG238.
Figure 1 – HSG 238 Primary Causes of Failure by Life Cycle Stage

Handling CCF in ISO 13849-1

Now that we understand WHAT Common-Cause Failure is, and WHY it’s important, we can talk about HOW it is handled in ISO 13849-1. Since ISO 13849-1 is intended to be a simplified functional safety standard, CCF analysis is limited to a checklist in Annex F, Table F.1. Note that Annex F is informative, meaning that it is guidance material to help you apply the standard. Since this is the case, you could use any other means suitable for assessing CCF mitigation, like those in IEC 61508, or in other standards.

Table F.1 is set up with a series of mitigation measures which are grouped together in related categories. Each group is provided with a score that can be claimed if you have implemented the mitigations in that group. ALL OF THE MEASURES in each group must be fulfilled in order to claim the points for that category. Here’s an example:

A portion of ISO 13849-1 Table F.1.
ISO 13849-1:2015, Table F.1 Excerpt

In order to claim the 20 points available for the use of separation or segregation in the system design, there must be a separation between the signal paths. Several examples of this are given for clarity.

Table F.1 lists six groups of mitigation measures. In order to claim adequate CCF mitigation, a minimum score of 65 points must be achieved. Only Category 2, 3 and 4 architectures are required to meet the CCF requirements in order to claim the PL, but without meeting the CCF requirement you cannot claim the PL, regardless of whether the design meets the other criteria or not.

One final note on CCF: If you are trying to review an existing control system, say in an existing machine, or in a machine designed by a third party where you have no way to determine the experience and training of the designers or the capability of the company’s change management process, then you cannot adequately assess CCF [8]. This fact is recognised in CSA Z432-16 [20], chapter 8. [20] allows the reviewer to simply verify that the architectural requirements, exclusive of any probabilistic requirements, have been met. This is particularly useful for engineers reviewing machinery under Ontario’s Pre-Start Health and Safety requirements [21], who are frequently working with less-than-complete design documentation.

In case you missed the first part of the series, you can read it here. In the next article in this series, I’m going to review the process flow for system analysis as currently outlined in ISO 13849-1. Watch for it!

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. The complete reference list is included in the last post of the series.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[2]     Safety of machinery — Safety-related parts of control systems — Part 2: Validation. 2nd Edition. ISO Standard 13849-2. 2012.

[3]      Safety of machinery — General principles for design — Risk assessment and risk reduction. ISO Standard 12100. 2010.

[8]     S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncertainties in the validation of an existing safety-related control circuit with the ISO 13849-1:2006 design standard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.

[17]      “failure mode”, 192-03-17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331–338, 2006.

[19]     Out of Control—Why control systems go wrong and how to prevent failure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

[20]     Safeguarding of Machinery. 3rd Edition. CSA Standard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Canada, 1990.

Testing Emergency Stop Systems

This entry is part 11 of 11 in the series Emergency Stop

Emergency Stop on machine consoleI’ve had a number of questions from readers regarding testing of emergency stop systems, and particularly with the frequency of testing. I addressed the types of tests that might be needed in another article covering Checking Emergency Stop Systems. This article will focus on the frequency of testing rather than the types of tests.

The Problem

Emergency stop systems are considered to be “complementary protective measures” in key machinery safety standards like ISO 12100 [1], and CSA Z432 [2]; this makes emergency stop systems the backup to the primary safeguards. Complementary protective measures are intended to permit “avoiding or limiting the harm” that may result from an emergent situation. By definition, this is a situation that has not been foreseen by the machine builder, or is the result of another failure. This could be a failure of another safeguarding system, or a failure in the machine that is not controlled by other means., e.g., a workpiece shatters due to a material flaw, and the broken pieces damage the machine, creating new, uncontrolled, failure conditions in the machine.

Emergency stop systems are manually triggered, and usually infrequently used. The lack of use means that functional testing of the system doesn’t happen in the normal course of operation of the machinery. Some types of faults may occur and remain undetected until the system is actually used, i.e., contact blocks falling off the back of the operator device. Failure at that point may be catastrophic, since by implication the primary safeguards have already failed, and thus the failure of the backup eliminates the possibility of avoiding or limiting harm.

To understand the testing requirements, it’s important to understand the risk and reliability requirements that drive the design of emergency stop systems, and then get into the test frequency question.

Requirements

In the past, there were no explicit requirements for emergency stop system reliability. Details like the colour of the operator device, or the way the stop function worked were defined in ISO 13850 [3], NFPA 79 [4], and IEC 60204-1 [5]. In the soon-to-be published 3rd edition of ISO 13850, a new provision requiring emergency stop systems to meet at least PLc will be added [6], but until publication, it is up to the designer to determine the safety integrity level, either PL or SIL, required. To determine the requirements for any safety function, the key is to start at the risk assessment. The risk assessment process requires that the designer understand the stage in the life cycle of the machine, the task(s) that will be done, and the specific hazards that a worker may be exposed to while conducting the task. This can become quite complex when considering maintenance and service tasks, and also applies to foreseeable failure modes of the machinery or the process. The scoring or ranking of risk can be accomplished using any suitable risk scoring tool that meets the minimum requirements in [1]. There are some good examples given in ISO/TR 14121-2 [7] if you are looking for some guidance. There are many good engineering textbooks available as well. Have a look at our Book List for some suggestions if you want a deeper dive.

Reliability

Once the initial unmitigated risk is understood, risk control measures can be specified. Wherever the control system is used as part of the risk control measure, a safety function must be specified. Specification of the safety function includes the Performance Level (PL), architectural category (B, 1-4), Mean Time to Dangerous Failure (MTTFd), and Diagnostic Coverage (DC) [6], or Safety Integrity Level (SIL), and Hardware Fault Tolerance (HFT), as described in IEC 62061 [8], as a minimum. If you are unfamiliar with these terms, see the definitions at the end of the article.

Referring to Figure 1, the “Risk Graph” [6, Annex A], we can reasonably state that for most machinery, a failure mode or emergent condition is likely to create conditions where the severity of injury is likely to require more than basic first aid, so selecting “S2” is the first step. In these situations, and particularly where the failure modes are not well understood, the highest level of severity of injury, S2, is selected because we don’t have enough information to expect that the injuries would only be minor. As soon as we make this selection, it is no longer possible to select any combination of Frequency or Probability parameters that will result in anything lower than PLc.

It’s important to understand that Figure 1 is not a risk assessment tool, but rather a decision tree used to select an appropriate PL based on the relevant risk parameters. Those parameters are:

Table 1 – Risk Parameters
Severity of Injury frequency and/or exposure to hazard possibility of avoiding hazard or limiting harm
S1 – slight (normally reversible injury) F1 – seldom-to-less-often and/or exposure time is short P1 – possible under specific conditions
S2 – serious (normally irreversible injury or death) F2 – frequent-to-continuous and/or exposure time is long P2 – scarcely possible
Decision tree used to determine PL based on risk parameters.
Figure 1 – “Risk Graph” for determining PL

PLc can be accomplished using any of three architectures: Category 1, 2, or 3. If you are unsure about what these architectures represent, have a look at my series covering this topic.

Category 1 is single channel, and does not include any diagnostics. A single fault can cause the loss of the safety function (i.e., the machine still runs even though the e-stop button is pressed). Using Category 1, the reliability of the design is based on the use of highly reliable components and well-tried safety principles. This approach can fail to danger.

Category 2 adds some diagnostic capability to the basic single channel configuration, and does not require the use of “well-tried” components. This approach can also fail to danger.

Category 3 architecture adds a redundant channel, and includes diagnostic coverage. Category 3 is not subject to failure due to single faults and is called “single-fault tolerant”. This approach is less likely to fail to danger, but still can in the presence of multiple, undetected, faults.

A key concept in reliability is the “fault”. This can be any kind of defect in hardware or software that results in unwanted behaviour or a failure. Faults are further broken down into dangerous and safe faults, meaning those that result in a dangerous outcome, and those that do not. Finally, each of these classes is broken down into detectable and undetectable faults. I’m not going to get into the mathematical treatment of these classes, but my point is this: there are undetectable dangerous faults. These are faults that cannot be detected by built-in diagnostics. As designers, we try to design the control system so that the undetectable dangerous faults are extremely rare, ideally the probability should be much less than once in the lifetime of the machine.

What is the lifetime of the machine? The standards writers have settled on a default lifetime of 20 years, thus the answer is that undetectable dangerous failures should happen much less than once in twenty years of 24/7/365 operation. So why does this matter? Each architectural category has different requirements for testing. The test rates are driven by the “Demand Rate”. The Demand Rate is defined in [6]. “SRP/CS” stands for “Safety Related Part of the Control System” in the definition:

3.1.30
demand rate (rd) – frequency of demands for a safety-related action of the SRP/CS

Each time the emergency stop button is pressed, a “demand” is put on the system. Looking at the “Simplified Procedure for estimating PL”, [6, 4.5.4], we find that the standard makes the following assumptions:

  • mission time, 20 years (see Clause 10);
  • constant failure rates within the mission time;
  • for category 2, demand rate <= 1/100 test rate;
  • for category 2, MTTFd,TE larger than half of MTTFd,L.

NOTE When blocks of each channel cannot be separated, the following can be applied: MTTFd of the summarized test channel (TE, OTE) larger than half MTTFd of the summarized functional channel (I, L, O).

So what does all that mean? The 20-year mission time is the assumed lifetime of the machinery. This number underpins the rest of the calculations in the standard, and is based on the idea that few modern control systems last longer than 20 years without being replaced or rebuilt. The constant failure rate points at the idea that systems used in the field will have components and controls that are not subject to infant mortality, nor are they old enough to start to fail due to age, but rather that the system is operating in the flat portion of the standardized failure rate “bathtub curve”, [9]. See Figure 2. Components that are subject to infant mortality failed at the factory and were removed from the supply chain. Those failing from “wear-out” are expected to reach that point after 20 years. If this is not the case, then the maintenance instructions for the system should include preventative maintenance tasks that require replacing critical components before they reach the predicted MTTFd.

Diagram of a standardized bathtub-shaped failure rate curve.
Figure 2 – Weibull Bathtub Curve [9]
For systems using Category 2 architecture, the automatic diagnostic test rate must be at least 100x the demand rate. Keep in mind that this test rate is normally accomplished automatically in the design of the controls, and is only related to the detectable safe or dangerous faults. Undetectable faults must have a probability of less than once in 20 years, and should be detected by the “proof test”. More on that a bit later.

Finally, the MTTFd of the functional channel must be at least twice that of the diagnostic system.

Category 1 has no diagnostics, so there is no guidance in [6] to help us out with these systems. Category 3 is single fault tolerant, so as long as we don’t have multiple undetected faults we can count on the system to function and to alert us when a single fault occurs; remember that the automatic tests may not be able to detect every fault. This is where the “proof test” comes in. What is a proof test? To find a definition for proof test, we have to look at IEC 61508-4 [10]:

3.8.5
proof test
periodic test performed to detect failures in a safety-related system so that, if necessary, the system can be restored to an “as new” condition or as close as practical to this condition

NOTE – The effectiveness of the proof test will be dependent upon how close to the “as new” condition the system is restored. For the proof test to be fully effective, it will be necessary to detect 100 % of all dangerous failures. Although in practice 100 % is not easily achieved for other than low-complexity E/E/PE safety-related systems, this should be the target. As a minimum, all the safety functions which are executed are checked according to the E/E/PES safety requirements specification. If separate channels are used, these tests are done for each channel separately.

The 20-year life cycle assumption used in the standards also applies to proof testing. Machine controls are assumed to get at least one proof test in their life time. The proof test should be designed to detect faults that the automatic diagnostics cannot detect. Proof tests are also conducted after major rebuilds and repairs to ensure that the system operates correctly.

If you know the architecture of the emergency stop control system, you can determine the test rate based on the demand rate. It would be considerably easier if the standards just gave us some minimum test rates for the various architectures. One standard, ISO 14119 [11] on interlocks does just that. Admittedly, this standard does not include emergency stop functions within its scope, as its focus is on interlocks, but since interlocking systems are more critical than the complementary protective measures that back them up, it would be reasonable to apply these same rules. Looking at the clause on Assessment of Faults, [9, 8.2], we find this guidance:

For applications using interlocking devices with automatic monitoring to achieve the necessary diagnostic coverage for the required safety performance, a functional test (see IEC 60204-1:2005, 9.4.2.4) can be carried out every time the device changes its state, e.g. at every access. If, in such a case, there is only infrequent access, the interlocking device shall be used with additional measures, because between consecutive functional tests the probability of occurrence of an undetected fault is increased.

When a manual functional test is necessary to detect a possible accumulation of faults, it shall be made within the following test intervals:

  • at least every month for PL e with Category 3 or Category 4 (according to ISO 13849-1) or SIL 3 with HFT (hardware fault tolerance) = 1 (according to IEC 62061);
  • at least every 12 months for PL d with Category 3 (according to ISO 13849-1) or SIL 2 with HFT (hardware fault tolerance) = 1 (according to IEC 62061).

NOTE It is recommended that the control system of a machine demands these tests at the required intervals e.g. by visual display unit or signal lamp. The control system should monitor the tests and stop the machine if the test is omitted or fails.

In the preceding, HFT=1 is equivalent to saying that the system is single-fault tolerant.

This leaves us then with recommended test frequencies for Category 2 and 3 architectures in PLc, PLd, and PLe, or for SIL 2 and 3 with HFT=1. We still don’t have a test frequency for PLc, Category 1 systems. There is no explicit guidance for these systems in the standards. How can we determine a test rate for these systems?

My approach would be to start by examining the MTTFd values for all of the subsystems and components. [6] requires that the system have HIGH MTTFd value, meaning 30 years <= MTTFd <= 100 years [6, Table 5]. If this is the case, then the once-in-20-years proof test is theoretically enough. If the system is constructed, for example, as shown Figure 2 below, then each component would have to have an MTTFd > 120 years. See [6, Annex C] for this calculation.

Basic Stop/Start Circuit
Figure 2 – Basic Stop/Start Circuit

PB1 – Emergency Stop Button

PB2 – Power “ON” Button

MCR – Master Control Relay

MOV – Surge Suppressor on MCR Coil

M1 – Machine prime-mover (motor)

Note that the fuses are not included, since they can only fail to safety, and assuming that they were specified correctly in the original design, are not subject to the same cyclical aging effects as the other components.

M1 is not included, since it is the controlled portion of the machine and is not part of the control system.

If a review of the components in the system shows that any single component falls below the target MTTFd, then I would consider replacing the system with a higher category design. Since most of these components will be unlikely to have MTTFd values on the spec sheet, you will likely have to convert from total life values (B10). This is outside the scope of this article, but you can find guidance in [6, Annex C]. More frequent testing, i.e., more than once in 20 years, is always acceptable.

Where manual testing is required as part of the design for any category of system, and particularly in Category 1 or 2 systems, the control system should alert the user to the requirement and not permit the machine to operate until the test is completed. This will help to ensure that the requisite tests are properly completed.

Need more information? Leave a comment below, or send me an email with the details of your application!

Definitions

3.1.9 [8]
functional safety

part of the overall safety relating to the EUC and the EUC control system which depends on the correct functioning of the E/E/PE safety-related systems, other technology safety-related systems and external risk reduction facilities

3.2.6 [8]
electrical/electronic/programmable electronic (E/E/PE)
based on electrical (E) and/or electronic (E) and/or programmable electronic (PE) technology

NOTE – The term is intended to cover any and all devices or systems operating on electrical principles.
EXAMPLE Electrical/electronic/programmable electronic devices include

  • electromechanical devices (electrical);
  • solid-state non-programmable electronic devices (electronic);
  • electronic devices based on computer technology (programmable electronic); see 3.2.5

3.5.1 [8]
safety function
function to be implemented by an E/E/PE safety-related system, other technology safety related system or external risk reduction facilities, which is intended to achieve or maintain a safe state for the EUC, in respect of a specific hazardous event (see 3.4.1)

3.5.2 [8]
safety integrity
probability of a safety-related system satisfactorily performing the required safety functions under all the stated conditions within a stated period of time

NOTE 1 – The higher the level of safety integrity of the safety-related systems, the lower the probability that the safety-related systems will fail to carry out the required safety functions.
NOTE 2 – There are four levels of safety integrity for systems (see 3.5.6).

3.5.6 [8]
safety integrity level (SIL)
discrete level (one out of a possible four) for specifying the safety integrity requirements of the safety functions to be allocated to the E/E/PE safety-related systems, where safety integrity level 4 has the highest level of safety integrity and safety integrity level 1 has the lowest

NOTE – The target failure measures (see 3.5.13) for the four safety integrity levels are specified in tables 2 and 3 of IEC 61508-1.

3.6.3 [8]
fault tolerance
ability of a functional unit to continue to perform a required function in the presence of faults or errors

NOTE – The definition in IEV 191-15-05 refers only to sub-item faults. See the note for the term fault in 3.6.1.
[ISO/IEC 2382-14-04-061]

3.1.1 [6]
safety–related part of a control system (SRP/CS)
part of a control system that responds to safety-related input signals and generates safety-related output signals

NOTE 1 The combined safety-related parts of a control system start at the point where the safety-related input signals are initiated (including, for example, the actuating cam and the roller of the position switch) and end at the output of the power control elements (including, for example, the main contacts of a contactor).
NOTE 2 If monitoring systems are used for diagnostics, they are also considered as SRP/CS.

3.1.2 [6]
category
classification of the safety-related parts of a control system in respect of their resistance to faults and their subsequent behaviour in the fault condition, and which is achieved by the structural arrangement of the parts, fault detection and/or by their reliability

3.1.3 [6]
fault
state of an item characterized by the inability to perform a required function, excluding the inability during preventive maintenance or other planned actions, or due to lack of external resources

NOTE 1 A fault is often the result of a failure of the item itself, but may exist without prior failure.
[IEC 60050-191:1990, 05-01]
NOTE 2 In this part of ISO 13849, “fault” means random fault.

3.1.4 [6]
failure
termination of the ability of an item to perform a required function

NOTE 1 After a failure, the item has a fault.
NOTE 2 “Failure” is an event, as distinguished from “fault”, which is a state.
NOTE 3 The concept as defined does not apply to items consisting of software only.
[IEC 60050–191:1990, 04-01]
NOTE 4 Failures which only affect the availability of the process under control are outside of the scope of this part of ISO 13849.

3.1.5 [6]
dangerous failure
failure which has the potential to put the SRP/CS in a hazardous or fail-to-function state

NOTE 1 Whether or not the potential is realized can depend on the channel architecture of the system; in redundant systems a dangerous hardware failure is less likely to lead to the overall dangerous or fail-to-function state.
NOTE 2 Adapted from IEC 61508-4:1998, definition 3.6.7.

3.1.20 [6]
safety function
function of the machine whose failure can result in an immediate increase of the risk(s)
[ISO 12100-1:2003, 3.28]

3.1.21 [6]
monitoring
safety function which ensures that a protective measure is initiated if the ability of a component or an element to perform its function is diminished or if the process conditions are changed in such a way that a decrease of the amount of risk reduction is generated

3.1.22 [6]
programmable electronic system (PES)
system for control, protection or monitoring dependent for its operation on one or more programmable electronic devices, including all elements of the system such as power supplies, sensors and other input devices, contactors and other output devices

NOTE Adapted from IEC 61508-4:1998, definition 3.3.2.

3.1.23 [6]
performance level (PL)
discrete level used to specify the ability of safety-related parts of control systems to perform a safety function under foreseeable conditions

NOTE See 4.5.1.

3.1.25 [6]
mean time to dangerous failure (MTTFd)
expectation of the mean time to dangerous failure

NOTE Adapted from IEC 62061:2005, definition 3.2.34.

3.1.26 [6]
diagnostic coverage (DC)
measure of the effectiveness of diagnostics, which may be determined as the ratio between the failure rate of detected dangerous failures and the failure rate of total dangerous failures

NOTE 1 Diagnostic coverage can exist for the whole or parts of a safety-related system. For example, diagnostic coverage could exist for sensors and/or logic system and/or final elements.
NOTE 2 Adapted from IEC 61508-4:1998, definition 3.8.6.

3.1.33 [6]
safety integrity level (SIL)
discrete level (one out of a possible four) for specifying the safety integrity requirements of the safety functions to be allocated to the E/E/PE safety-related systems, where safety integrity level 4 has the highest level of safety integrity and safety integrity level 1 has the lowest

[IEC 61508-4:1998, 3.5.6]

Acknowledgements

Thanks to my colleagues Derek Jones and Jonathan Johnson, both from Rockwell Automation, and members of ISO TC199. Their suggestion to reference ISO 14119 clause 8.2 was the seed for this article.

I’d also like to acknowledge Ronald Sykes, Howard Touski, Mirela Moga, Michael Roland, and Grant Rider for asking the questions that lead to this article.

References

[1]     Safety of machinery — General principles for design — Risk assessment and risk reduction. ISO 12100. International Organization for Standardization (ISO). Geneva 2010.

[2]    Safeguarding of Machinery. CSA Z432. Canadian Standards Association. Toronto. 2004.

[3]    Safety of machinery – Emergency stop – Principles for design. ISO 13850. International Organization for Standardization (ISO). Geneva 2006.

[4]    Electrical Standard for Industrial Machinery. NFPA 79. National Fire Protection Association (NFPA). Batterymarch Park. 2015

[5]    Safety of machinery – Electrical equipment of machines – Part 1: General requirements. IEC 60204-1. International Electrotechnical Commission (IEC). Geneva. 2009.

[6]    Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design.  ISO 13849-1. International Organization for Standardization (ISO). Geneva. 2006.

[7]    Safety of machinery — Risk assessment — Part 2: Practical guidance and examples of methods. ISO/TR 14121-2. International Organization for Standardization (ISO). Geneva. 2012.

[8]   Safety of machinery – Functional safety of safety-related electrical, electronic and programmable electronic control systems. IEC 62061. International Electrotechnical Commission (IEC). Geneva. 2005.

[9]    D. J. Wilkins (2002, November). “The Bathtub Curve and Product Failure Behavior. Part One – The Bathtub Curve, Infant Mortality and Burn-in”. Reliability Hotline [Online]. Available: http://www.weibull.com/hotwire/issue21/hottopics21.htm. [Accessed: 26-Apr-2015].

[10] Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 4: Definitions and abbreviations. IEC 61508-4. International Electrotechnical Commission (IEC). Geneva. 1998.

[11] Safety of machinery — Interlocking devices associated with guards — Principles for design and selection. ISO 14119. International Organization for Standardization (ISO). Geneva. 2013.

Sources for Standards

CANADA

Canadian Standards Association sells CSA, ISO and IEC standards to the Canadian Market.

USA

NSSN: National Standards Search Engine powered by ANSI offers standards from most US Standards Development Organizations. They also sell ISO and IEC standards into the US market.

International

International Organization for Standardization (ISO).

International Electrotechnical Commission (IEC).

CSA Z432 Safeguarding of Machinery – 3rd Edition

If you build machinery for the Canadian market, or if you modify equipment in Canadian workplaces, you will be familiar with CSA Z432, Safeguarding of Machinery. This standard has been around since 1992, with the last major revision published in 2004. CSA has reconvened the Technical Committee responsible for this important standard to revise the document to reflect the current practices in the machinery market, and to bring in new ideas that are developing internationally that affect what Canadian machine builders are doing.

If you have interest in this standard and would like to have your thoughts and concerns communicated to the Technical Committee, please feel free to contact me with your suggestions. Work starts on 28-Jan-14. Your input is welcomed!