The Safety Resilience Viewpoint
Director Steve Mash investigates further why safety and security are intrinsically linked
Following the previous article on Mass Transport and Safety Resilience, it is clear that systems have two key properties that are intertwined, these being their security and their safety.
Both of these properties are assessed and controlled using risk management processes which have common techniques and methodologies. In an ever more interconnected world, threat actors can have both security and safety consequences.
A control system for a safety critical process, i.e. a process whose failure can cause injury or loss of life and which is interconnected to other systems, may be vulnerable to interference from accidental or malicious commands received through these interconnections.
For example, the state of a stop signal on a mass transport system being changed to the incorrect state may lead to a vehicle, be it train, tram or bus, proceeding when it is unsafe to do so. However it is not normal practice for a safety critical process to be vulnerable to a single event such as this stop signal; systems are designed with controls in depth so that only in the event of a sequence of improbable independent failures or events would there be a resultant effect that could cause the injury of loss of life. This strength in depth is the resilience of the system.
The resilience of the system is the intrinsic ability of the system to manage maintain safe operation before, during, and after any change or disturbance as a result of both expected and unexpected conditions. To be fully resilient, the system needs to have both proactive and reactive behavioural properties, managing unexpected changes to boundary conditions, combinations or external events or challenging of underlying assumptions.
A key aspect of a systems resilience is that its properties are temporal in nature. As components of the system and the environment that the system operates in change over time, then the reliance of the system changes.
Resilience can decrease if external events that were originally improbable become increasingly probable, but conversely resilience can increase if external events become less probable. Similarly, resilience can decrease if the reliability of safety controls decrease over time and conversely resilience can increase if controls are replaced with more reliable controls. Where control systems include human intervention as part of the control process, resilience can decrease or increase with changes to the training, experience, practices and culture of the personnel involved in the process.
In March 2015 it was reported that a steam train approached a junction where two main lines merged. The steam train triggered an automatic speed restriction control which was ignored by the train driver as this speed restriction was greater than the maximum speed the steam train was permitted to operate at.
As a result of the train driver not acknowledging the automatic speed restriction control, an automatic brake application was initiated which was then manually cancelled by the train driver. As a consequence of this manual cancellation, the Train Protection and Warning System was prevented from automatically applying the brakes.
When the train driver then passed through a signal displaying an amber caution indication without reducing speed, the Automatic Warning System was triggered but the brakes were not applied. When the steam train then approached a signal displaying a red stop indication, the only control available to halt the train was the manual application of the brakes by the train driver. The reaction and braking time for this manual operation was such that the steam train passed through the red signal and can to a halt in a position where it was across the junction astride both main lines. In this case, the train on the other main line which was the reason for the red stop signal, had cleared the junction before the arrival of the steam train and so no collision occurred.
In this example the actions of a single human operator was such that all automatic safety controls were inadvertently disabled and the safety of the system was dependent upon the actions of this human operator.
It is easy to blame this incident on human error, but the resilience of the system as a whole needs to be examined. How resilient is a system that is dependent upon the training and experience of a single operator? Did the processes and procedures for train operations consider the implications of the system including trains whose maximum operating speed is lower than imposed speed restrictions? How would the processes and procedures manage any unexpected events such as signals showing incorrect information as a result of inadvertent or malicious actions?
If signals would be externally controlled over a network, could a malicious action interfere such that red stop signals were changed to a green proceed signals? This last question is a scenario where a potential security weakness could lead to a safety critical consequence.
Current safety methodologies consider the consequences of events such as signals in the incorrect state due to technical failures and to some extent human errors. However consideration of malicious actions both within and external to the boundaries of the system are traditionally not relevant as such actions were not possible. With the trend towards using networks for the control and monitor of systems, vulnerabilities of these networks have now made such actions not only possible, but arguably likely in a world where state sponsored and terrorist cyber attacks are looking to inflict casualties wherever and whenever possible.
Whilst technologically there are no issues with connecting any device to a network, be it a railway signally system or a domestic smoke alarm, the safety implications of making such a connection must be considered at system development, not after system implementation, when sufficient controls can be implemented to reduce risks to an acceptable level.
These can be security controls, safety controls or a combination of the two. By integrating security and safety risk assessment and management, implementing a single set of controls which are both proportionate and result in residual risks that are as low as reasonable practicable is more readily achievable. Treating security and safety separately may result in duplication of effort or, more seriously, undetected gaps in the controls where each discipline believed that the other was responsible for that control. Amethyst have specialists in both disciplines and are therefore well placed to assist any of our clients who may wish to take a serious look at integrating these disciplines.