Fault Consideration & Fault Exclusion
ISO 13849 – 1, Chapter 7 [1, 7] discusses the need for fault consideration and fault exclusion. Fault consideration is the process of examining the components and sub-systems used in the safety-related part of the control system (SRP/CS) and making a list of all the faults that could occur in each one. This a definitely non-trivial exercise!
Thinking back to some of the earlier articles in this series where I mentioned the different types of faults, you may recall that there are detectable and undetectable faults, and there are safe and dangerous faults, leading us to four kinds of fault:
- Safe undetectable faults
- Dangerous undetectable faults
- Safe detectable faults
- Dangerous undetectable faults
For systems where no diagnostics are used, Category B and 1, faults need to be eliminated using inherently safe design techniques. Care needs to be taken when classifying components as “well-tried” versus using a fault exclusion, as components that might normally be considered “well-tried” might not meet those requirements in every application.
For systems where diagnostics are part of the design, i.e., Category 2, 3, and 4, the fault lists are used to evaluate the diagnostic coverage (DC) of the test systems. Depending on the architecture, certain levels of DC are required to meet the relevant PL, see [1, Fig. 5]. The fault lists are starting point for the determination of DC, and are an input into the hardware and software designs. All of the dangerous detectable faults must be covered by the diagnostics, and the DC must be high enough to meet the PLr. for the safety function.
The fault lists and fault exclusions are used in the Validation portion of this process as well. At the start of the Validation process flow chart [2, Fig. 1], you can see how the fault lists and the criteria used for fault exclusion are used as inputs to the validation plan.
Faults that can be excluded do not need to validated, saving time and effort during the system verification and validation (V & V). How is this done?
The first step is to develop a list of potential faults that could occur, based on the components and subsystems included in SRP/CS. ISO 13849 – 2  includes lists of typical faults for various technologies. For example, [2, Table A.4] is the fault list for mechanical components.
 contains tables similar to Table A.4 for:
- Pressure-coil springs
- Directional control valves
- Stop (shut-off) valves/non-return (check) valves/quick-action venting valves/shuttle valves, etc.
- Flow valves
- Pressure valves
- Hose assemblies
- Pressure transmitters and pressure medium transducers
- Compressed air treatment — Filters
- Compressed-air treatment — Oilers
- Compressed air treatment — Silencers
- Accumulators and pressure vessels
- Fluidic Information processing — Logical elements
As you can see, there are many different types of faults that need to be considered. Keep in mind that I did not give you all of the different fault lists – this post would be a mile long if I did that! The point is that you need to develop a fault list for your system, and then consider the impact of each fault on the operation of the system. If you have components or subsystems that are not listed in the tables, then you need to develop your own fault lists for those items. Using Failure Modes and Effects Analysis (FMEA) techniques are usually the best approach for these components , .
When considering the faults to be included in the list there are a few things that should be considered [1, 7.2]:
- if after the first fault occurs other faults develop due to the first fault, then you can group those faults together as a single fault
- two or more single faults with a common cause can be considered as a single fault
- multiple faults with different causes but occurring simultaneously is considered improbable and does not need to be considered
A voltage regulator fails in a system power supply so that the 24 Vdc output rises to an unregulated 36 Vdc (the internal power supply bus voltage), and after some time has passed, two sensors fail, then all three failures can be grouped and considered as a single fault.
If a lightning strike occurs on the power line and the resulting surge voltage on the 400 V mains causes an interposing contactor and the motor drive it controls to fail to danger, then these failures may be grouped and considered as one.
A pneumatic lubricator runs out of lubricant and is not refilled, depriving downstream pneumatic components of lubrication. The spool on the system dump valve sticks open because it is not cycled often enough. Neither of these failures has the same cause, so there is no need to consider them as occurring simultaneously because the probability of both happening concurrently is extremely small. One caution: These two faults MAY have a common cause – poor maintenance. Even if this is true and you decide to consider them to be two faults with a common cause, they could then be grouped as a single fault.
Once you have your well-considered fault lists together, the next question is “Can any of the listed faults be excluded?” This is a tricky question! There are a few points to consider:
- Does the system architecture allow for fault exclusion?
- Is the fault technically improbable, even if it is possible?
- Does experience show that the fault is unlikely to occur?*
- Are there technical requirements related to the application and the hazard that might support fault exclusion?
* BE CAREFUL with this one!
Whenever faults are excluded, a detailed justification for the exclusion needs to be included in the system design documentation. Simply deciding that the fault can be excluded is NOT ENOUGH! Consider the risk a person will be exposed to in the event the fault occurs. If the severity is very high, i.e., severe permanent injury or death, you may not want to exclude the fault even if you think you could. Careful consideration of the resulting injury scenario is needed.
Basing a fault exclusion on personal experience is seldom considered adequate, which is why I added the asterisk (*) above. Look for good statistical data to support any decision to use a fault exclusion.
There is much more information available in IEC 61508 – 2 on the subject of fault exclusion, and there is good information in some of the books mentioned below [0.2], [0.3], and [0.4]. If you know of additional resources you would like to share, please post the information in the comments!
- 3.1.3 fault
- state of an item characterized by the inability to perform a required function, excluding the inability during preventive maintenance or other planned actions, or due to lack of external resources
- Note 1 to entry: A fault is often the result of a failure of the item itself, but may exist without prior failure.
- Note 2 to entry: In this part of ISO 13849, “fault” means random fault. [SOURCE: IEC 60050?191:1990, 05 – 01.]
Here are some books that I think you may find helpful on this journey:
[0.2] Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.
Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. Included in the last post of the series is the complete reference list.
 S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncertainties in the validation of an existing safety-related control circuit with the ISO 13849 – 1:2006 design standard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104 – 112, Jan. 2014.
 D. S. G. Nix, Y. Chinniah, F. Dosio, M. Fessler, F. Eng, and F. Schrever, “Linking Risk and Reliability — Mapping the output of risk assessment tools to functional safety requirements for safety related control systems,” 2015.
 Functional safety of electrical/electronic/programmable electronic safety-related systems – Part 2: Requirements for electrical/electronic/programmable electronic safety-related systems. IEC Standard 61508 – 2. 2010.
 “IFA – Practical aids: Software-Assistent SISTEMA: Safety Integrity – Software Tool for the Evaluation of Machine Applications”, Dguv.de, 2017. [Online]. Available: http://www.dguv.de/ifa/praxishilfen/practical-solutions-machine-safety/software-sistema/index.jsp. [Accessed: 30- Jan- 2017].
 “failure mode”, 192 – 03-17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.
 M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331 – 338, 2006.
 Out of Control — Why control systems go wrong and how to prevent failure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.
 “Field-programmable gate array”, En.wikipedia.org, 2017. [Online]. Available: https://en.wikipedia.org/wiki/Field-programmable_gate_array. [Accessed: 16-Jun-2017].
 Analysis techniques for system reliability – Procedure for failure mode and effects analysis (FMEA). 2nd Ed. IEC Standard 60812. 2006.
 “Failure mode and effects analysis”, En.wikipedia.org, 2017. [Online]. Available: https://en.wikipedia.org/wiki/Failure_mode_and_effects_analysis. [Accessed: 16-Jun-2017].