ISO 13849-1 Analysis — Part 8: Fault Exclusion

Post updated 2019-07-24. Ed.

Fault Consideration & Fault Exclusion

ISO 13849-1, Chapter 7 [1, 7] discusses the need for fault consideration and fault exclusion. Fault consideration is the process of examining the components and sub-systems used in the safety-related part of the control system (SRP/CS) and making a list of all the faults that could occur in each one. This a definitely non-trivial exercise!

Thinking back to some of the earlier articles in this series where I mentioned the different types of faults, you may recall that there are detectable and undetectable faults, and there are safe and dangerous faults, leading us to four kinds of fault:

  • Safe undetectable faults
  • Dangerous undetectable faults
  • Safe detectable faults
  • Dangerous detectable faults

For systems where no diagnostics are used, i.e., Category B and 1, faults need to be eliminated using inherently safe design techniques. Care needs to be taken when classifying components as “well-tried” versus using a fault exclusion, as components that might normally be considered “well-tried” might not meet those requirements in every application. [2, Annex A], Validation tools for mechanical systems, discusses the concepts of “Basic Safety Principles”, “Well-Tried Safety Principles”, and “Well-tried components”.  [2, Annex A] also provides examples of faults and relevant fault exclusion criteria. There are similar Annexes that cover pneumatic systems [2, Annex B], hydraulic systems [2, Annex C], and electrical systems [2, Annex D].

For systems where diagnostics are part of the design, i.e., Category 2, 3, and 4, the fault lists are used to evaluate the diagnostic coverage (DC) of the test systems. Depending on the architecture, certain levels of DC are required to meet the relevant PL, see [1, Fig. 5]. The fault lists are starting point for the determination of DC, and are an input into the hardware and software designs. All of the dangerous detectable faults must be covered by the diagnostics, and the DC must be high enough to meet the PLr for the safety function.

The fault lists and fault exclusions are used in the Validation portion of this process as well. At the start of the Validation process flowchart [2, Fig. 1], you can see how the fault lists and the criteria used for fault exclusion are used as inputs to the validation plan.

The diagram shows the first few stages in the ISO 13849-2 Validation process. See ISO 13849-2, Figure 1.
Start of ISO 13849-2 Fig. 1

Faults that can be excluded do not need to validated, saving time and effort during the system verification and validation (V & V). How is this done?

Fault Consideration

The first step is to develop a list of potential faults that could occur, based on the components and subsystems included in SRP/CS. ISO 13849-2 [2] includes lists of typical faults for various technologies. For example, [2, Table A.4] is the fault list for mechanical components.

Mechanical fault list from ISO 13849-2
Table A.4 — Faults and fault exclusions — Mechanical devices, components and elements
(e.g. cam, follower, chain, clutch, brake, shaft, screw, pin, guide, bearing)

[2] contains tables similar to Table A.4 for:

  • Pressure-coil springs
  • Directional control valves
  • Stop (shut-off) valves/non-return (check) valves/quick-action venting valves/shuttle valves, etc.
  • Flow valves
  • Pressure valves
  • Pipework
  • Hose assemblies
  • Connectors
  • Pressure transmitters and pressure medium transducers
  • Compressed air treatment — Filters
  • Compressed-air treatment — Oilers
  • Compressed air treatment — Silencers
  • Accumulators and pressure vessels
  • Sensors
  • Fluidic Information processing — Logical elements
  • etc.

As you can see, there are many different types of faults that need to be considered. Keep in mind that I did not give you all of the different fault lists – this post would be a mile long if I did that! The point is that you need to develop a fault list for your system, and then consider the impact of each fault on the operation of the system. If you have components or subsystems that are not listed in the tables, then you need to develop your own fault lists for those items. Failure Modes and Effects Analysis (FMEA) is usually the best approach for developing fault lists for these components [23], [24].

When considering the faults to be included in the list there are a few things that should be considered [1, 7.2]:

  • if after the first fault occurs other faults develop due to the first fault, then you can group those faults together as a single fault
  • two or more single faults with a common cause can be considered as a single fault
  • multiple faults with different causes but occurring simultaneously is considered improbable and does not need to be considered

Examples

#1 – Voltage Regulator

A voltage regulator fails in a system power supply so that the 24 Vdc output rises to an unregulated 36 Vdc (the internal power supply bus voltage), and after some time has passed, two sensors fail. All three failures can be grouped and considered as a single fault because they originate in a single failure in the voltage regulator.

#2 – Lightning Strike

If a lightning strike occurs on the power line and the resulting surge voltage on the 400 V mains causes an interposing contactor and the motor drive it controls to fail to danger, then these failures may be grouped and considered as one. Again, a single event causes all of the subsequent failures.

#3 – Pneumatic System Lubrication

3a – A pneumatic lubricator runs out of lubricant and is not refilled, depriving downstream pneumatic components of lubrication.

3b – The spool on the system dump valve sticks open because it is not cycled often enough.

Neither of these failures has the same cause, so there is no need to consider them as occurring simultaneously because the probability of both happening concurrently is extremely small. One caution: These two faults MAY have a common cause – poor maintenance. If this is true and you decide to consider them to be two faults with a common cause, they could then be grouped as a single fault.

Fault Exclusion

Once you have your well-considered fault lists together, the next question is “Can any of the listed faults be excluded?” This is a tricky question! There are a few points to consider:

  • Does the system architecture allow for fault exclusion?
  • Is the fault technically improbable, even if it is possible?
  • Does experience show that the fault is unlikely to occur?*
  • Are there technical requirements related to the application and the hazard that might support fault exclusion?

* BE CAREFUL with this one!

Whenever faults are excluded, a detailed justification for the exclusion needs to be included in the system design documentation. Simply deciding that the fault can be excluded is NOT ENOUGH! Consider the risk a person will be exposed to in the event the fault occurs. If the severity is very high, i.e., severe permanent injury or death, you may not want to exclude the fault even if you think you could. Careful consideration of the resulting injury scenario is needed.

Basing a fault exclusion on personal experience is seldom considered adequate, which is why I added the asterisk (*) above. Look for good statistical data to support any decision to use a fault exclusion.

There is much more information available in IEC 61508-2 on the subject of fault exclusion, and there is good information in some of the books mentioned below [0.1], [0.2], and [0.3]. If you know of additional resources you would like to share, please post the information in the comments!

Definitions

3.1.3
fault
state of an item characterized by the inability to perform a required function, excluding the inability during preventive maintenance or other planned actions, or due to lack of external resources
Note 1 to entry: A fault is often the result of a failure of the item itself, but may exist without prior failure.
Note 2 to entry: In this part of ISO 13849, “fault” means random fault. [SOURCE: IEC 60050-191:1990, 05-01.]

Book List

Here are some books that I think you may find helpful on this journey:

[0]     B. Main, Risk Assessment: Basics and Benchmarks, 1st ed. Ann Arbor, MI USA: DSE, 2004.

[0.1]  D. Smith and K. Simpson, Safety critical systems handbook. Amsterdam: Elsevier/Butterworth-Heinemann, 2011.

[0.2]  Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.

[0.3] Overview of techniques and measures related to EMC for Functional Safety, 1st ed. Stevenage, UK: Overview of techniques and measures related to EMC for Functional Safety, 2013.

[0.4] “Code of practice for electromagnetic resilience, 1st ed. Stevenage, UK: IET Standards TC4.3 EMC, 2017.

[0.5] “Code of Practice: Competence for Safety Related Systems Practitioners, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2016.

References

Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.

[1]     Safety of machinery — Safety-related parts of control systems — Part 1: General principles for design. 3rd Edition. ISO Standard 13849-1. 2015.

[2]     Safety of machinery — Safety-related parts of control systems — Part 2: Validation. 2nd Edition. ISO Standard 13849-2. 2012.

[3]      Safety of machinery — General principles for design — Risk assessment and risk reduction. ISO Standard 12100. 2010.

[4]     Safeguarding of Machinery. 2nd Edition. CSA Standard Z432. 2004.

[5]     Risk Assessment and Risk Reduction- A Guideline to Estimate, Evaluate and Reduce Risks Associated with Machine Tools. ANSI Technical Report B11.TR3. 2000.

[6]    Safety of machinery — Emergency stop function — Principles for design. ISO Standard 13850. 2015.

[7]     Functional safety of electrical/electronic/programmable electronic safety-related systems. 7 parts. IEC Standard 61508. Edition 2. 2010.

[8]     S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncertainties in the validation of an existing safety-related control circuit with the ISO 13849-1:2006 design standard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104–112, Jan. 2014.

[9]    Guidance on the application of ISO 13849-1 and IEC 62061 in the design of safety-related control systems for machinery. IEC Technical Report TR 62061-1. 2010.

[10]     Safety of machinery – Functional safety of safety-related electrical, electronic and programmable electronic control systems. IEC Standard 62061. 2005.

[11]    Guidance on the application of ISO 13849-1 and IEC 62061 in the design of safety-related control systems for machinery. IEC Technical Report 62061-1. 2010.

[12]    D. S. G. Nix, Y. Chinniah, F. Dosio, M. Fessler, F. Eng, and F. Schrever, “Linking Risk and Reliability—Mapping the output of risk assessment tools to functional safety requirements for safety related control systems,” 2015.

[13]    Safety of machinery. Safety related parts of control systems. General principles for design. CEN Standard EN 954-1. 1996.

[14]   Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 2: Requirements for electrical/electronic/programmable electronic safety-related systems. IEC Standard 61508-2. 2010.

[15]     Reliability Prediction of Electronic Equipment. Military Handbook MIL-HDBK-217F. 1991.

[16]     “IFA – Practical aids: Software-Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv.de, 2017. [Online]. Available: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].

[17]      “failure mode”, 192-03-17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.

[18]      M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331–338, 2006.

[19]     Out of Control—Why control systems go wrong and how to prevent failure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.

[20]     Safeguarding of Machinery. 3rd Edition. CSA Standard Z432. 2016.

[21]     O. Reg. 851, INDUSTRIAL ESTABLISHMENTS. Ontario, Canada, 1990.

[22]     “Field-programmable gate array”, En.wikipedia.org, 2017. [Online]. Available: https://en.wikipedia.org/wiki/Field-programmable_gate_array. [Accessed: 16-Jun-2017].

[23]     Analysis techniques for system reliability – Procedure for failure mode and effects analysis (FMEA). 2nd Ed. IEC Standard 60812. 2006.

[24]     “Failure mode and effects analysis”, En.wikipedia.org, 2017. [Online]. Available: https://en.wikipedia.org/wiki/Failure_mode_and_effects_analysis. [Accessed: 16-Jun-2017].

2 thoughts on “ISO 13849-1 Analysis — Part 8: Fault Exclusion

  1. I have always been very uncomfortable about fault exclusion.

    Either you are saying that you totally trust another companies Q/A, and the comprehensive knowledge and discpline of all the designers and testers, or you are saying you think you know so much about something, there is nothing at all you don’t know.

    When put in that context I have always felt any gains from fault exclusion struggle to outweigh the certainty of the lack of unknowns.

    I feel it is far easier, even if only on your own sleep at night, to avoid exclusion where ever practical, if not possible.

    1. Hi Gareth,
      I think your position is more black-and-white than I would take, however, it does take some significant effort to justify a fault exclusion. If the component you are considering for fault exclusion is described in the tables in ISO 13849-2, then the justification is reasonably straightforward. If, however, you want to justify fault exclusion in a component not listed in part 2, then you need to do your homework. I’d start with an FMEA, and then back that up with a fault-tree analysis. Once you are done with those two steps, you will have either convinced yourself that the fault exclusion is justifiable, or you will have realized that you can’t adequately justify it. In either case, you will have the basis for your decision documented which is always important. If after all that, you still don’t want to use fault excursion, there’s nothing wrong with that. As a control systems designer that is always your right to make the decision you feel most comfortable about.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.