ISO 13849-1 Analysis — Part 6: CCF — Common Cause Failures

This entry is part 6 of 6 in the series How to do a 13849-1 analysis

What is a Common Cause Failure?

There are two similar-sounding terms that people often get confused: Common Cause Failure (CCF) and Common Mode Failure. While these two types of failures sound similar, they are different. A Common Cause Failure is a failure in a system where two or more portions of the system fail at the same time from a single common cause. An example could be a lightning strike that causes a contactor to weld and simultaneously takes out the safety relay processor that controls the contactor. Common cause failures are therefore two different manners of failure in two different components, but with a single cause.

Common Mode Failure is where two components or portions of a system fail in the same way, at the same time. For example, two interposing relays both fail with welded contacts at the same time. The failures could be caused by the same cause or from different causes, but the way the components fail is the same.

Common-cause failure includes common mode failure, since a common cause can result in a common manner of failure in identical devices used in a system.

Here are the formal definitions of these terms:

3.1.6 common cause failure CCF

failures of different items, resulting from a single event, where these failures are not consequences of each other

Note 1 to entry: Common cause failures should not be confused with common mode failures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04-23.] [1]

 

3.36 common mode failures

failures of items characterized by the same fault mode

NOTE Common mode failures should not be confused with common cause failures, as the common mode failures can result from different causes. [lEV 191-04-24] [3]

The “common mode” failure definition uses the phrase “fault mode”, so let’s look at that as well:

failure mode
DEPRECATED: fault mode
manner in which failure occurs

Note 1 to entry: A failure mode may be defined by the function lost or other state transition that occurred. [IEV 192-03-17] [17]

As you can see, “fault mode” is no longer used, in favour of the more common “failure mode”, so it is possible to re-write the common-mode failure definition to read, “failures of items characterised by the same manner of failure.”

Random, Systematic and Common Cause Failures

Why do we need to care about this? There are three manners in which failures occur: random failures, systematic failures, and common cause failures. When developing safety related controls, we need to consider all three and mitigate them as much as possible.

Random failures do not follow any pattern, occurring randomly over time, and are often brought on by over-stressing the component, or from manufacturing flaws. Random failures can increase due to environmental or process-related stresses, like corrosion, EMI, normal wear-and-tear, or other over-stressing of the component or subsystem. Random failures are often mitigated through selection of high-reliability components [18].

Systematic failures include common-cause failures, and occur because some human behaviour occurred that was not caught by procedural means. These failures are due to design, specification, operating, maintenance, and installation errors. When we look at systematic errors, we are looking for things like training of the system designers, or quality assurance procedures used to validate the way the system operates. Systematic failures are non-random and complex, making them difficult to analyse statistically. Systematic errors are a significant source of common-cause failures because they can affect redundant devices, and because they are often deterministic, occurring whenever a set of circumstances exist.

Systematic failures include many types of errors, such as:

  • Manufacturing defects, e.g., software and hardware errors built into the device by the manufacturer.
  • Specification mistakes, e.g. incorrect design basis and inaccurate software specification.
  • Implementation errors, e.g., improper installation, incorrect programming, interface problems, and not following the safety manual for the devices used to realise the safety function.
  • Operation and maintenance, e.g., poor inspection, incomplete testing and improper bypassing [18].

Diverse redundancy is commonly used to mitigate systematic failures, since differences in component or subsystem design tend to create non-overlapping systematic failures, reducing the likelihood of a common error creating a common-mode failure. Errors in specification, implementation, operation and maintenance are not affected by diversity.

Fig 1 below shows the results of a small study done by the UK’s Health and Safety Executive in 1994 [19] that supports the idea that systematic failures are a significant contributor to safety system failures. The study included only 34 systems (n=34), so the results cannot be considered conclusive. However, there were some startling results. As you can see, errors in the specification of the safety functions (Safety Requirement Specification) resulted in about 44% of the system failures in the study. Based on this small sample, systematic failures appear to be a significate source of failures.

Pie chart illustrating the proportion of failures in each phase of the life cycle of a machine, based on data taken from HSE Report HSG238.
Figure 1 – HSG 238 Primary Causes of Failure by Life Cycle Stage

Handling CCF in ISO 13849-1

Now that we understand WHAT Common-Cause Failure is, and WHY it’s important, we can talk about HOW it is handled in ISO 13849-1. Since ISO 13849-1 is intended to be a simplified functional safety standard, CCF analysis is limited to a checklist in Annex F, Table F.1. Note that Annex F is informative, meaning that it is guidance material to help you apply the standard. Since this is the case, you could use any other means suitable for assessing CCF mitigation, like those in IEC 61508, or in other standards.

Table F.1 is set up with a series of mitigation measures which are grouped together in related categories. Each group is provided with a score that can be claimed if you have implemented the mitigations in that group. ALL OF THE MEASURES in each group must be fulfilled in order to claim the points for that category. Here’s an example:

A portion of ISO 13849-1 Table F.1.
ISO 13849-1:2015, Table F.1 Excerpt

In order to claim the 20 points available for the use of separation or segregation in the system design, there must be a separation between the signal paths. Several examples of this are given for clarity.

Table F.1 lists six groups of mitigation measures. In order to claim adequate CCF mitigation, a minimum score of 65 points must be achieved. Only Category 2, 3 and 4 architectures are required to meet the CCF requirements in order to claim the PL, but without meeting the CCF requirement you cannot claim the PL, regardless of whether the design meets the other criteria or not.

One final note on CCF: If you are trying to review an existing control system, say in an existing machine, or in a machine designed by a third party where you have no way to determine the experience and training of the designers or the capability of the company’s change management process, then you cannot adequately assess CCF [8]. This fact is recognised in CSA Z432-16 [20], chapter 8. [20] allows the reviewer to simply verify that the architectural requirements, exclusive of any probabilistic requirements, have been met. This is particularly useful for engineers reviewing machinery under Ontario’s Pre-Start Health and Safety requirements [21], who are frequently working with less-than-complete design documentation.

In case you missed the first part of the series, you can read it here. In the next article in this series, I’m going to review the process flow for system analysis as currently outlined in ISO 13849-1. Watch for it!

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3]  Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. The complete reference list is included in the last post of the series.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[2]     Safety of machinery — Safety-related parts of control systems — Part 2: Validation. 2nd Edition. ISO Standard 13849-2. 2012.

[3]      Safety of machinery — General principles for design — Risk assessment and risk reduction. ISO Standard 12100. 2010.

[8]     S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncertainties in the validation of an existing safety-related control circuit with the ISO 13849-1:2006 design standard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.

[17]      “failure mode”, 192-03-17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331–338, 2006.

[19]     Out of Control—Why control systems go wrong and how to prevent failure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

[20]     Safeguarding of Machinery. 3rd Edition. CSA Standard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Canada, 1990.

CSA Z432 Third Edition Open for Public Review!

CSA Z432, Safeguarding of Machinery, is the basic standard for Canada when it comes to most types of machinery. Only Power Presses and Press Brakes, and Industrial Robots are covered separately in their own standards. CSA Z432 provides guidance on important topics, like:

  • Risk Assessment
  • Risk reduction through the Hierarchy of Controls
  • Guard design requirements
  • Safeguarding device application requirements, and
  • Instructions and information for use

This standard should be used by everyone in Canada responsible for the safe design of machinery used in Canadian workplaces, and for the safety of workers who use machinery in their daily tasks.

CSA has just opened public review on CSA Z432, Safeguarding of Machinery, third edition. If you are a user, a builder of machinery, or an evaluator of machinery, this is your opportunity to see the draft of this important standard, and to make comments to help the Technical Committee improve the standard on your behalf.

To access the public review copy, you must register on CSA’s Public Review system. Registration is free and allows you to get read-only access to the drafts of all new standards that CSA is preparing to publish. The time you take to read and comment on new standards is very valuable to the Technical Committees, as it helps us to correct areas where misunderstandings or confusion may exist, and to add material where it is needed.

See the Draft

Review closes 2-Jan-2016, so don’t delay!

If you need more information, please contact Jill Collins at CSA Group.

 

Canada Adopts ISO 13857 – Safety Distances

Safety Distances

ISO 13857 2008, Figure 2 - Safety Distance for reaching over a protective structure
ISO 13857 2008, Figure 2 – Reaching Over Protective Structure

As part of the work on the 3rd Edition of CSA Z432, Canada has decided to adopt ISO 13857 as CSA Z13857. The standard is to be adopted without technical deviations.

Why ISO 13857?

CSA Z432 has long had portions of the information in ISO 13857 in its annexes – Annex C has tables for reaching through openings and reaching over structures, much like the one above, that users have found useful over the years. Unfortunately, these tables have also proved a bit confusing, as they are somewhat different than CSA Z432 Table 3. While neither set of safe-distance values is less safe, the values in Table 3 are very similar to those used in the USA, which was the original source for that information.

When Z432 was first being developed in the late 1980’s, most machinery was coming in from the US, so harmonisation with US OSHA guidelines was more important than harmonising internationally. Today, import of machinery from the EU is common, and Canadian export of machinery around the world is part of doing business. CSA’s Safety of Machinery Technical Committee decided to help manufacturers and importers by harmonising Canada’s standards with the International Standards by adopting ISO 13857 as a Canadian Standard.

Public Review

If you are interested in reviewing and  commenting on this adoption, please visit the CSA Public Review Page for the standard. Comments close 13/07/2015.

Details:

Identifier: Z13857

Title: Safety of machinery — Safety distances to prevent hazard zones being reached by upper and lower limbs (Adoption without deviations) (New Standard) Expiry date: 13/07/2015

This International Standard establishes values for safety distances in both industrial and non-industrial environments to prevent machinery hazard zones being reached. The safety distances are appropriate for protective structures. It also gives information about distances to impede free access by the lower limbs (see 4.3).