ISO 13849-1 Analysis — Part 6: CCF — Common Cause Failures

Post updated 2019-07-24. Ed.

What is a “Common Cause Failure”?

There are two similar-sounding terms that people often get confused: Common Cause Failure (CCF) and Common Mode Failure. While these two types of failures sound similar, they are different. A Common Cause Failure is a failure in a system where two or more portions of the system fail at the same time from a single common cause. An example could be a lightning strike that causes a contactor to weld and simultaneously takes out the safety relay processor that controls the contactor. Common cause failures are two different manners of failure in two different components but with a single cause.

Common Mode Failure is where two components or portions of a system fail in the same way, at the same time. For example, two interposing relays both fail with welded contacts at the same time. The failures could be caused by the same cause or from different causes, but the way the components fail is the same.

Common-cause failure includes common mode failure since a common cause can result in a common manner of failure in identical components or sub-systems used in a system.

Here are the formal definitions of these terms:

3.1.6 common cause failure CCF

failures of different items, resulting from a single event, where these failures are not consequences of each other

Note 1 to entry: Common cause failures should not be confused with common mode failures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04-23.] [1]

3.36 common mode failures

failures of items characterized by the same fault mode

NOTE Common mode failures should not be confused with common cause failures, as the common mode failures can result from different causes. [lEV 191-04-24] [3]

The “common mode” failure definition uses the phrase “fault mode,” so let’s look at that as well:

failure mode
DEPRECATED: fault mode
manner in which failure occurs

Note 1 to entry: A failure mode may be defined by the function lost or other state transition that occurred. [IEV 192-03-17] [17]

As you can see, “fault mode” is no longer used in favour of the more common “failure mode,” so it is “possible” to re-write the common-mode failure definition to read, “failures of items characterized by the same manner of failure.”

Random, Systematic and Common Cause Failures

Why do we need to care about this? There are three ways in which failures occur: random, systematic, and common cause failures. When developing safety-related controls, we need to consider all three and mitigate them as much as possible.

Random failures do not follow any pattern, occurring randomly over time, and are often brought on by over-stressing the component or manufacturing flaws. Random failures can increase due to environmental or process-related stresses, like corrosion, EMI, normal wear-and-tear, or other over-stressing of the component or subsystem.?Random failures are often mitigated by selecting high-reliability components [18].

Systematic failures include common-cause failures. They occur because some human behaviour was not caught by procedural means. These failures are due to design, specification, operating, maintenance, and installation errors. When we look at systematic errors, we are looking for things like training of the system designers or quality assurance procedures used to validate how the system operates. Systematic failures are non-random and complex, making them difficult to analyze statistically. Systematic errors are a significant source of common-cause failures because they can affect redundant devices and are often deterministic, occurring whenever a set of circumstances exist.

Systematic failures include many types of errors, such as:

  • Manufacturing defects, e.g., software and hardware errors built into the device by the manufacturer.
  • Specification mistakes, e.g. incorrect design basis and inaccurate software specification.
  • Implementation errors, e.g., improper installation, incorrect programming, interface problems, and not following the safety manual for the devices used to realize the safety function.
  • Operation and maintenance, e.g., poor inspection, incomplete testing and improper bypassing [18].

Diverse redundancy is commonly used to mitigate systematic failures since differences in component or subsystem design tend to create non-overlapping systematic failures, reducing the likelihood of a common error creating a common-mode failure. Diversity does not affect errors in specification, implementation, operation and maintenance.

Fig 1 below shows the results of a small study by the UK’s Health and Safety Executive in 1994 [19] that supports the idea that systematic failures significantly contribute to safety system failures. The study included only 34 systems (n=34), so the results cannot be considered conclusive. However, there were some startling results. As you can see, errors in the specification of the safety functions (Safety Requirement Specification) resulted in about 44% of the system failures in the study. Based on this small sample, systematic failures appear to be a significate source of failures.

Pie chart illustrating the proportion of failures in each phase of the life cycle of a machine, based on data taken from HSE Report HSG238.
Figure 1 – HSG 238 Primary Causes of Failure by Life Cycle Stage

Handling CCF in ISO 13849-1

Now that we understand WHAT Common-Cause Failure is and WHY it’s important, we can talk about HOW it is handled in ISO 13849-1. Since ISO 13849-1 is intended to be a simplified functional safety standard, CCF analysis is limited to a checklist in Annex F, Table F.1. Note that Annex F is informative, meaning that it is guidance material to help you apply the standard. Since this is the case, you could use other means suitable for assessing CCF mitigation, like those in IEC 61508 or other standards.

I should point out that the CCF scoring used in ISO 13849-1 is a simplified implementation of the β values used in IEC 61508. The values in [1, Table F.1] were not pulled out of thin air.

Table F.1 is set up with a series of mitigation measures grouped together in related categories. Each group is given a score that can be claimed if you have implemented the mitigation in that group. The measures listed in each group are examples of measures that could be used to claim the points for that category. You can use other methods to achieve the same results as the examples in the table.

Here’s an excerpt:

A portion of ISO 13849-1 Table F.1.
Excerpt from ISO 13849-1:2015, Table F.1

To claim the 15 points available for separation or segregation in the system design, you can use any of the measures in the bulleted list or other means that result in signal separation or segregation. The bulleted list in section 1 are examples, not mandatory requirements.

Table F.1 lists six groups of mitigation measures. A minimum score of 65 points must be achieved to claim adequate CCF mitigation. Only Category 2, 3 and 4 architectures are required to meet the CCF requirements to claim the PL, but without meeting the CCF requirement, you cannot claim the PL, regardless of whether the design meets the other criteria or not.

One final note on CCF: If you are trying to review an existing control system, say in an existing machine or in a machine designed by a third party where you have no way to determine the experience and training of the designers or the capability of the company’s change management process, then you cannot adequately assess CCF [8]. This fact is recognized in CSA Z432-16 [20], chapter 8. [20] allows the reviewer to verify that the architectural requirements, exclusive of any probabilistic requirements, have been met. This is particularly useful for engineers reviewing machinery under Ontario’s Pre-Start Health and Safety Review requirements [21], who frequently work with less-than-complete design documentation.

If you missed the first part of the series, you can read it here. In the next article in this series, I review the process flow for system analysis as currently outlined in ISO 13849-1.


Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3] Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

[0.4] “Code of practice for electromagnetic resilience, 1st ed. Stevenage, UK: IET Standards TC4.3 EMC, 2017.

[0.5] “Code of Practice: Competence for Safety Related Systems Practitioners, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2016.


References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. The complete reference list is included in the last post of the series.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[2]     Safety of machinery — Safety-related parts of control systems — Part 2: Validation. 2nd Edition. ISO Standard 13849-2. 2012.

[3]      Safety of machinery — General principles for design — Risk assessment and risk reduction. ISO Standard 12100. 2010.

[8]     S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncertainties in the validation of an existing safety-related control circuit with the ISO 13849-1:2006 design standard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104-112, Jan. 2014.

[17]      “failure mode”, 192-03-17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them”, Process Saf. Prog., vol. 25, no. 4, pp. 331?338, 2006.

[19]     Out of Control—Why control systems go wrong and how to prevent failure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

[20]     Safeguarding of Machinery. 3rd Edition. CSA Standard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Canada, 1990.

© 2017 – 2022, Compliance inSight Consulting Inc. Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.