When it comes to emergency stop devices there is no doubt that the red mushroom-head push button is the most common – they seem to be everywhere. The second most common emergency stop device is the pull-cord, and like the light-curtain in safeguarding, the pull-cord is probably the most misapplied emergency stop device. Continue reading “Emergency Stop Pull-Cords”
I am always looking for interesting examples of machinery safety problems to share on MS101. Recently I was scrolling Reddit/r/OSHA and found these three real-world examples.
Broken Emergency Stop Buttons
The first and most obvious kinds of failures are those resulting from either wear out or damage to emergency stop devices like e-stop buttons or pull cords. Here’s a great example:
The operator device in this picture has two problems:
1) the button operator has failed and
2) the e-stop is incorrectly marked.
The correct marking would be a yellow background in place of the red/silver legend plate, like the example below. The yellow background could have the words “emergency stop” on it, but this is not necessary as the colour combination is enough.
There is an ISO/IEC symbol for an emergency stop that could also be used .I wonder how the contact block(s) inside the enclosure are doing? Contact blocks have been known to fall off the back of emergency stop operator buttons, leaving you with a button that does nothing when pressed. Contact blocks secured with screws are most vulnerable to this kind of failure. Losing a contact block like this happens most often in high-vibration conditions. I have run across this in real life while doing inspections on client sites.
There are contact blocks made to detect this kind of failure, like Allen Bradley’s self-monitoring contact block, 800TC-XD4S, or the similar Siemens product,3SB34. Most controls component manufacturers will be likely to have similar components.
Here’s another example from a machine inspection I did a while ago. Note the wire “keeper” that prevents the button from getting lost!
Here is an example of poor planning when installing new barrier guards. The emergency stop button is now out of reach. The original poster does not indicate a reason why the emergency stop for the machine he was operating was mounted on a different machine.
No Emergency Stop at all
Finally, and possibly the worst example of all. Here is an improvised emergency stop using a set of wire cutters. No further comment required.
If you have any examples you would like to share, feel free to add them in comments below. References to particular employers or manufacturers will be deleted before posts are approved.
 “IEC 60417 – 5638, Emergency Stop”, Iso.org, 2017. [Online]. Available: https://www.iso.org/obp/ui/#iec:grs:60417:5638. [Accessed: 27- Jun- 2017].
What is a Common Cause Failure?
There are two similar-sounding terms that people often get confused: Common Cause Failure (CCF) and Common Mode Failure. While these two types of failures sound similar, they are different. A Common Cause Failure is a failure in a system where two or more portions of the system fail at the same time from a single common cause. An example could be a lightning strike that causes a contactor to weld and simultaneously takes out the safety relay processor that controls the contactor. Common cause failures are therefore two different manners of failure in two different components, but with a single cause.
Common Mode Failure is where two components or portions of a system fail in the same way, at the same time. For example, two interposing relays both fail with welded contacts at the same time. The failures could be caused by the same cause or from different causes, but the way the components fail is the same.
Common-cause failure includes common mode failure, since a common cause can result in a common manner of failure in identical devices used in a system.
Here are the formal definitions of these terms:
3.1.6 common cause failure CCF
failures of different items, resulting from a single event, where these failures are not consequences of each other
Note 1 to entry: Common cause failures should not be confused with common mode failures (see ISO 12100:2010, 3.36). [SOURCE: IEC 60050?191-am1:1999, 04 – 23.] 
3.36 common mode failures
failures of items characterized by the same fault mode
NOTE Common mode failures should not be confused with common cause failures, as the common mode failures can result from different causes. [lEV 191 – 04-24] 
The “common mode” failure definition uses the phrase “fault mode”, so let’s look at that as well:
DEPRECATED: fault mode
manner in which failure occurs
Note 1 to entry: A failure mode may be defined by the function lost or other state transition that occurred. [IEV 192 – 03-17] 
As you can see, “fault mode” is no longer used, in favour of the more common “failure mode”, so it is possible to re-write the common-mode failure definition to read, “failures of items characterised by the same manner of failure.”
Random, Systematic and Common Cause Failures
Why do we need to care about this? There are three manners in which failures occur: random failures, systematic failures, and common cause failures. When developing safety related controls, we need to consider all three and mitigate them as much as possible.
Random failures do not follow any pattern, occurring randomly over time, and are often brought on by over-stressing the component, or from manufacturing flaws. Random failures can increase due to environmental or process-related stresses, like corrosion, EMI, normal wear-and-tear, or other over-stressing of the component or subsystem. Random failures are often mitigated through selection of high-reliability components .
Systematic failures include common-cause failures, and occur because some human behaviour occurred that was not caught by procedural means. These failures are due to design, specification, operating, maintenance, and installation errors. When we look at systematic errors, we are looking for things like training of the system designers, or quality assurance procedures used to validate the way the system operates. Systematic failures are non-random and complex, making them difficult to analyse statistically. Systematic errors are a significant source of common-cause failures because they can affect redundant devices, and because they are often deterministic, occurring whenever a set of circumstances exist.
Systematic failures include many types of errors, such as:
- Manufacturing defects, e.g., software and hardware errors built into the device by the manufacturer.
- Specification mistakes, e.g. incorrect design basis and inaccurate software specification.
- Implementation errors, e.g., improper installation, incorrect programming, interface problems, and not following the safety manual for the devices used to realise the safety function.
- Operation and maintenance, e.g., poor inspection, incomplete testing and improper bypassing .
Diverse redundancy is commonly used to mitigate systematic failures, since differences in component or subsystem design tend to create non-overlapping systematic failures, reducing the likelihood of a common error creating a common-mode failure. Errors in specification, implementation, operation and maintenance are not affected by diversity.
Fig 1 below shows the results of a small study done by the UK’s Health and Safety Executive in 1994  that supports the idea that systematic failures are a significant contributor to safety system failures. The study included only 34 systems (n=34), so the results cannot be considered conclusive. However, there were some startling results. As you can see, errors in the specification of the safety functions (Safety Requirement Specification) resulted in about 44% of the system failures in the study. Based on this small sample, systematic failures appear to be a significate source of failures.
Handling CCF in ISO 13849 – 1
Now that we understand WHAT Common-Cause Failure is, and WHY it’s important, we can talk about HOW it is handled in ISO 13849 – 1. Since ISO 13849 – 1 is intended to be a simplified functional safety standard, CCF analysis is limited to a checklist in Annex F, Table F.1. Note that Annex F is informative, meaning that it is guidance material to help you apply the standard. Since this is the case, you could use any other means suitable for assessing CCF mitigation, like those in IEC 61508, or in other standards.
Table F.1 is set up with a series of mitigation measures which are grouped together in related categories. Each group is provided with a score that can be claimed if you have implemented the mitigations in that group. ALL OF THE MEASURES in each group must be fulfilled in order to claim the points for that category. Here’s an example:
In order to claim the 20 points available for the use of separation or segregation in the system design, there must be a separation between the signal paths. Several examples of this are given for clarity.
Table F.1 lists six groups of mitigation measures. In order to claim adequate CCF mitigation, a minimum score of 65 points must be achieved. Only Category 2, 3 and 4 architectures are required to meet the CCF requirements in order to claim the PL, but without meeting the CCF requirement you cannot claim the PL, regardless of whether the design meets the other criteria or not.
One final note on CCF: If you are trying to review an existing control system, say in an existing machine, or in a machine designed by a third party where you have no way to determine the experience and training of the designers or the capability of the company’s change management process, then you cannot adequately assess CCF . This fact is recognised in CSA Z432-16 , chapter 8.  allows the reviewer to simply verify that the architectural requirements, exclusive of any probabilistic requirements, have been met. This is particularly useful for engineers reviewing machinery under Ontario’s Pre-Start Health and Safety requirements , who are frequently working with less-than-complete design documentation.
In case you missed the first part of the series, you can read it here. In the next article in this series, I’m going to review the process flow for system analysis as currently outlined in ISO 13849 – 1. Watch for it!
Here are some books that I think you may find helpful on this journey:
[0.2] Electromagnetic Compatibility for Functional Safety, 1st ed. Stevenage, UK: The Institution of Engineering and Technology, 2008.
Note: This reference list starts in Part 1 of the series, so “missing” references may show in other parts of the series. The complete reference list is included in the last post of the series.
 S. Jocelyn, J. Baudoin, Y. Chinniah, and P. Charpentier, “Feasibility study and uncertainties in the validation of an existing safety-related control circuit with the ISO 13849 – 1:2006 design standard,” Reliab. Eng. Syst. Saf., vol. 121, pp. 104 – 112, Jan. 2014.
 “failure mode”, 192 – 03-17, International Electrotechnical Vocabulary. IEC International Electrotechnical Commission, Geneva, 2015.
 M. Gentile and A. E. Summers, “Common Cause Failure: How Do You Manage Them?,” Process Saf. Prog., vol. 25, no. 4, pp. 331 – 338, 2006.
 Out of Control — Why control systems go wrong and how to prevent failure, 2nd ed. Richmond, Surrey, UK: HSE Health and Safety Executive, 2003.