ISO 13849-1 Analysis — Part 6: CCF — Common Cause Failures

This entry is part 6 of 6 in the series How to do a 13849-1 analysis

What is a Common Cause Failure?

There are two similar-sounding terms that people often get confused: Common Cause Failure (CCF) and Common Mode Failure. While these two types of failures sound similar, they are different. A Common Cause Failure is a failure in a system where two or more portions of the system fail at the same time from a single common cause. An example could be a lightning strike that causes a contactor to weld and simultaneously takes out the safety relay processor that controls the contactor. Common cause failures are therefore two different manners of failure in two different components, but with a single cause.

Common Mode Failure is where two components or portions of a system fail in the same way, at the same time. For example, two interposing relays both fail with welded contacts at the same time. The failures could be caused by the same cause or from different causes, but the way the components fail is the same.

Common-cause failure includes common mode failure, since a common cause can result in a common manner of failure in identical devices used in a system.

Here are the formal definitions of these terms:

3.1.6 common cause failure CCF

failures of different items, resulting from a single event, where these failures are not consequences of each other

Note 1 to entry: Common cause failures should not be confused with common mode failures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04-23.] [1]

 

3.36 common mode failures

failures of items characterized by the same fault mode

NOTE Common mode failures should not be confused with common cause failures, as the common mode failures can result from different causes. [lEV 191-04-24] [3]

The “common mode” failure definition uses the phrase “fault mode”, so let’s look at that as well:

failure mode
DEPRECATED: fault mode
manner in which failure occurs

Note 1 to entry: A failure mode may be defined by the function lost or other state transition that occurred. [IEV 192-03-17] [17]

As you can see, “fault mode” is no longer used, in favour of the more common “failure mode”, so it is possible to re-write the common-mode failure definition to read, “failures of items characterised by the same manner of failure.”

Random, Systematic and Common Cause Failures

Why do we need to care about this? There are three manners in which failures occur: random failures, systematic failures, and common cause failures. When developing safety related controls, we need to consider all three and mitigate them as much as possible.

Random failures do not follow any pattern, occurring randomly over time, and are often brought on by over-stressing the component, or from manufacturing flaws. Random failures can increase due to environmental or process-related stresses, like corrosion, EMI, normal wear-and-tear, or other over-stressing of the component or subsystem. Random failures are often mitigated through selection of high-reliability components [18].

Systematic failures include common-cause failures, and occur because some human behaviour occurred that was not caught by procedural means. These failures are due to design, specification, operating, maintenance, and installation errors. When we look at systematic errors, we are looking for things like training of the system designers, or quality assurance procedures used to validate the way the system operates. Systematic failures are non-random and complex, making them difficult to analyse statistically. Systematic errors are a significant source of common-cause failures because they can affect redundant devices, and because they are often deterministic, occurring whenever a set of circumstances exist.

Systematic failures include many types of errors, such as:

  • Manufacturing defects, e.g., software and hardware errors built into the device by the manufacturer.
  • Specification mistakes, e.g. incorrect design basis and inaccurate software specification.
  • Implementation errors, e.g., improper installation, incorrect programming, interface problems, and not following the safety manual for the devices used to realise the safety function.
  • Operation and maintenance, e.g., poor inspection, incomplete testing and improper bypassing [18].

Diverse redundancy is commonly used to mitigate systematic failures, since differences in component or subsystem design tend to create non-overlapping systematic failures, reducing the likelihood of a common error creating a common-mode failure. Errors in specification, implementation, operation and maintenance are not affected by diversity.

Fig 1 below shows the results of a small study done by the UK’s Health and Safety Executive in 1994 [19] that supports the idea that systematic failures are a significant contributor to safety system failures. The study included only 34 systems (n=34), so the results cannot be considered conclusive. However, there were some startling results. As you can see, errors in the specification of the safety functions (Safety Requirement Specification) resulted in about 44% of the system failures in the study. Based on this small sample, systematic failures appear to be a significate source of failures.

Pie chart illustrating the proportion of failures in each phase of the life cycle of a machine, based on data taken from HSE Report HSG238.
Figure 1 – HSG 238 Primary Causes of Failure by Life Cycle Stage

Handling CCF in ISO 13849-1

Now that we understand WHAT Common-Cause Failure is, and WHY it’s important, we can talk about HOW it is handled in ISO 13849-1. Since ISO 13849-1 is intended to be a simplified functional safety standard, CCF analysis is limited to a checklist in Annex F, Table F.1. Note that Annex F is informative, meaning that it is guidance material to help you apply the standard. Since this is the case, you could use any other means suitable for assessing CCF mitigation, like those in IEC 61508, or in other standards.

Table F.1 is set up with a series of mitigation measures which are grouped together in related categories. Each group is provided with a score that can be claimed if you have implemented the mitigations in that group. ALL OF THE MEASURES in each group must be fulfilled in order to claim the points for that category. Here’s an example:

A portion of ISO 13849-1 Table F.1.
ISO 13849-1:2015, Table F.1 Excerpt

In order to claim the 20 points available for the use of separation or segregation in the system design, there must be a separation between the signal paths. Several examples of this are given for clarity.

Table F.1 lists six groups of mitigation measures. In order to claim adequate CCF mitigation, a minimum score of 65 points must be achieved. Only Category 2, 3 and 4 architectures are required to meet the CCF requirements in order to claim the PL, but without meeting the CCF requirement you cannot claim the PL, regardless of whether the design meets the other criteria or not.

One final note on CCF: If you are trying to review an existing control system, say in an existing machine, or in a machine designed by a third party where you have no way to determine the experience and training of the designers or the capability of the company’s change management process, then you cannot adequately assess CCF [8]. This fact is recognised in CSA Z432-16 [20], chapter 8. [20] allows the reviewer to simply verify that the architectural requirements, exclusive of any probabilistic requirements, have been met. This is particularly useful for engineers reviewing machinery under Ontario’s Pre-Start Health and Safety requirements [21], who are frequently working with less-than-complete design documentation.

In case you missed the first part of the series, you can read it here. In the next article in this series, I’m going to review the process flow for system analysis as currently outlined in ISO 13849-1. Watch for it!

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. The complete reference list is included in the last post of the series.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[2]     Safety of machinery — Safety-related parts of control systems — Part 2: Validation. 2nd Edition. ISO Standard 13849-2. 2012.

[3]      Safety of machinery — General principles for design — Risk assessment and risk reduction. ISO Standard 12100. 2010.

[8]     S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncertainties in the validation of an existing safety-related control circuit with the ISO 13849-1:2006 design standard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.

[17]      “failure mode”, 192-03-17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331–338, 2006.

[19]     Out of Control—Why control systems go wrong and how to prevent failure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

[20]     Safeguarding of Machinery. 3rd Edition. CSA Standard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Canada, 1990.

Series NavigationISO 13849-1 Analysis — Part 5: Diagnostic Coverage (DC)

Author: Doug Nix

+DougNix is Managing Director and Principal Consultant at Compliance InSight Consulting, Inc. (http://www.complianceinsight.ca) in Kitchener, Ontario, and is Lead Author and Managing Editor of the Machinery Safety 101 blog.

Doug's work includes teaching machinery risk assessment techniques privately and through Conestoga College Institute of Technology and Advanced Learning in Kitchener, Ontario, as well as providing technical services and training programs to clients related to risk assessment, industrial machinery safety, safety-related control system integration and reliability, laser safety and regulatory conformity.

Follow me on Academia.edu//a.academia-assets.com/javascripts/social.js