[SystemSafety] Proposed rewrite of IEC 61508 "proven in use" assessment conditions for SW

Mon Jul 8 16:49:30 CEST 2013

Beginning in 2008 and continuing through 2010 I was one of a group of people working for NRC on the issue of "Proven in Use".  The topic I was charged with was evaluating an assertion by a U.S. Company and its partner that their proposal to use software in New Production Power Reactors in the U.S. was appropriate because their software had been operating for decades (4 or 5) and had experienced no failures.  I was asked to study this matter and come up with the literature and a review that would give grounds for acceptance or rejection of this assertion.

The discussion below IS NOT ORIGINAL WORK and more particularly quotes and notes to myself on IEC 61508 and other topics related to the assertion.  I offer this review to you for whatever clarification it might offer.  My key contributors were the UK, York group, Nancy Leveson, and a few others listed in the bibliography.  Please remember, THIS IS WORK I DID FOR MY OWN USE AND NOT FOR PUBLICATION.  THUS I CLAIM NO AUTHORSHIP FOR THE ITEM BELOW:

                                                Issues in the application of software safety standards
A risk driven cost/benefit approach to safety management as embodied in many of the standards (and in the UK Health & Safety Executive’s ALARP principle), conflicts with the ‘duty of care’ principle of tort or negligence law. This more stringent principle holds that “all due care and diligence” should be used to eliminate or control hazards, regardless of the probability of occurrence. Of the current set of standards discussed in this paper only DEF (AUST) 5679 canvasses the issues arising from the difference between statute and common or civil law interpretations. Further complications can be introduced when translating such decision making criteria across international boundaries.

IEC 61508, still require high integrity levels (10-8/hr for SIL 4) of safety critical systems, whilst having no empirical and quantitative way to demonstrate it. In response to this inability the majority of software safety standards adopt a qualitative assessment scheme where a range of confidence measures (usually process oriented)
Typified by the unreflecting adoption of the ‘example’ risk criteria provided in many standards.
are used to support a claim of achieving the specified integrity level. Because these measures include process requirements integrity levels must be levied upon the design process as requirements. The implication of this assignment of requirements is that if, perhaps due to a revised risk assessment, these integrity levels are raised later in the program then additional effort to increase the integrity level will also be needed. A further complication specific to protection systems, is that the probability of the hazard we are trying to control may itself be uncertain. Vesely and Rasmussen (Rasmussen 1984) point out that accident frequency estimates may only be credible to a factor of 10 and that low frequency (extreme) accident events i.e. occurring at 10-9/operating year may only be credible to a factor of 100. Unhappily this means that our integrity requirement could, in reality, be higher by at least factor of magnitude. Perhaps partly in response to this uncertainty the system safety community has traditionally adopted an approach in which the focus of the first part of any program is to identify hazards and eliminate or control them rather than attempt to quantify risk (MIL-STD-882C). This possibly reflects an appreciation that meaningful failure or hazard rate data is rarely available at the start of most programs.

2.5 Allocating integrity levels
 The majority of software safety standards adopt an integrity level approach to software, categorizing
software in terms of its criticality to system safety. As an example, illustrated in Figure 2.1, MIL-STD-882C uses software authority to establish Software Hazard Risk Indices (SHRI). Similarly the four development assurance levels (DALs) of DO178B embody this approach, while IEC 61508 uses an associated level of risk reduction to allocate safety integrity levels (SILs). In the United Kingdom, DEF STAN 00-56 Issue 2 allocates SILs based on hazard severity, with adjustments for probability, while in Australia DEF AUST 5679 uses a more complex scheme, based on levels of trust.

different techniques can be used to allocate integrity level requirements:
1. Consequence/Autonomy (RTCA/DO-178 or MIL-STD-882C),
2. Modified HAZOP (SILs are associated to each HAZOP line item),
3. Consequence Only Method (Only severity is considered in the assignment of SILs),
4. Risk Matrices (MIL-STD-882C),
5. Risk Graph (IEC 61508),
6. Layers of Protection Analysis, or
7. Fault Tree Analysis (SAE/ARP4761 App. L)

Most standards identify specific design and assurance activities to be followed at different integrity levels. Unfortunately these standards also vary widely in the methods invoked and the degree of tailoring that a project can apply. For example DO-178B defines a basic development process but focuses upon software product testing and inspection to assure safety. Other standards such as DEF STAN 00-55 focus on the definition of safety requirements that are acquitted through evidence. Some standards, such as DEF AUST 5679, emphasize the use of formal methods to achieve the highest integrity levels while others, such as IEC 61508, invoke a broad range of techniques to deliver a safety function at a required integrity level.  In addition to the variance in process, there is also variance in the normative nature of such standards, these range from the prescriptive approach of (DEF (AUST) 5679 to the guidance approach of the civilian sector DO-178B.

What is a safety argument?
A safety argument can be defined as a series of premises that support a claim as to the safety of the  system7. Safety arguments are inherently inductive hypotheses as they argue that the system will be safe, in circumstances not yet observed, based upon a small set of data. As a result we can only argue that the premises support (but not logically entail) the claim. Two general patterns for such hypotheses recur repeatedly throughout the literature and existing standards:
1. Hazard directed. Embodying the definition of safe as ‘all hazard(s) risk eliminated, controlled or reduced’. Assumes that hazards have been identified and countermeasures are appropriate.  Embodied in standards such as MIL-STD-882; and
2. Functional integrity. Embodies the definition of safe as ‘function meets integrity levels and no
--For example, non-ionising radiation safety limits have evolved and been revised over several decades in response to increasing knowledge of their effects.
--Note that the role of evidence is to support the premises and claim of the safety argument. A safety case equals the safety argument plus supporting evidence.

hazardous interactions between functions.
Assumes an appropriate integrity level assignment and implementation of valid integrity requirements. Embodied in standards such as DEF-STAN 00-56, or IEC 61508 (Kelly, 1997).  The first argument comprises a series of individual premises that a hazard is eliminated or controlled. While such premises are strongly testable, a weakness in the argument is whether the set of identified hazards is complete. The argument is usually strengthened by adding premises that accident mitigation is in place and that design standards have been applied to eliminate, reduce or control un-identified hazards. The second argument comprises a series of premises that via a series of accomplished activities we achieve an ‘integrity’ level. A weakness in the argument is the difficulty of demonstrating how each premise (integrity level activity) actually contributes to safety. As a result it is difficult to argue whether the set of activities is both necessary and complete. Software safety standards generally embody the integrity level pattern. For both arguments assurance evidence serves two purposes:
1. Demonstrating process compliance, and
2. Demonstrating software safety attributes.
It is important to note that when we demonstrate compliance with process requirements for a specific project we are not proving that the standard is correct.  What we are demonstrating is that we have followed the standards ‘argument’ in that particular instance. Even direct demonstration of the safety attributes still relies on assumptions as to whether the gathered evidence is a valid confidence measure. A final, and somewhat unsettling, thought is that as our inductive argument only supports the claim for safety, we will never be able to prove that the system is safe. Given that (a) software requirements errors cause a proportionally higher number of software accidents, (b) they dominate the number of design faults, and (c) complexity, asynchronous behavior and coupling (Perrow 1984) also seem to increase their rate of occurrence; it would be reasonable to expect that software safety standards should address them comprehensively.  Unfortunately this is not the case, for example, DO-178B and DEF STAN 00-55 both spend approximately one paragraph each on the issue of requirements compliance.

Failure to specify software hazards
Specifying hazards is useful because it leads us naturally to hazard countermeasures. Unfortunately by applying a functional hazard approach that stops at the software to system interface we end up with ‘generic’ software hazards for which it is difficult to develop meaningful controls. An example from a real software safety analysis:
“Undetected software fault, leading to loss of flight control.”
As it stands such a hazard description is meaningless as (for example) not all undetected software faults can lead to the identified hazardous state. A more tractable definition would encompass the operational context, the specific causal software design fault(s) and the resultant failure mode evident at the software boundary:
“A timing failure (priority inversion) of DFCS_N1 module causes a loss of control output failure during terrain following flight, leading to loss of flight control.”
This more detailed hazard description can provide a linkage into the behavior of the software that allows us to formulate specific hazard controls.

Software correctness or reliability ≠ safety
Recent accidents involving software illustrate that software safety is not simply a matter of ensuring that the software is ‘correct’ and performs its functions reliably. For example, the Ariane 5 rocket’s software performed exactly as specified, yet this resulted in the destruction of the vehicle. Instead such accidents actually demonstrate that system accidents can occur even while the components of the system perform to specification.  The use of redundancy to achieve reliability by many software standards is a good illustration of this mistaken belief.

Standards, normative or resource?
Two viewpoints represent the opposite ends of the spectrum on the application of standards. The  traditional view is that standards are intended to constrain and reduce variability, they are well thought out and therefore rule-based (normative) in nature. In this view safe design practice equals following the standard. A more recent viewpoint is that standards represent a resource for action, where safety results from the correct adaptation or non-adaptation of the standard to context. In this view safe design practice equals understanding the gap between theory and practice.

Process or Product based
only DO-178B provides detailed guidance on how to integrate components developed to a different software safety standard into a current system to meet a specific safety integrity level.  Yet despite the large number of standards, very few provide the detailed design and implementation guidance that compares with the standards of other engineering disciplines.
because software is so flexible that standardization and constraint (in the physical world physics does this for us) is needed.

Another second argument against a software product standard is the apparent difficulty of standardizing software across a multitude of differing application domains. However software inherently does not ‘know’ whether it is running a heart lung machine or a nuclear power plant, both are still control problems. In fact one can come up with a fairly small number of application classes ranging from process control loops to decision support systems, Leveson for example identifies seven such classes (Leveson, 1995).  Understanding the system context is still important in order to communicate the criticality of certain requirements, as accidents such as the Mars Polar Lander failure illustrate (Casani 2000). But such issues are a system engineering and system safety problem in the first instance not a direct software problem per se.  It is also important not to dismiss ‘product’ standards as simply a generic template. In fact good product standards embody both high level principles and detailed design mechanisms that can be implemented. As a (not exhaustive) example, a hierarchical software safety standard could be developed that specified:
1. Fault hypotheses, high level safety goals and strategies for specific design domains;
2. Architectural level design patterns for safety critical applications i.e. single channel protected,
safety monitor, safety kernels;
3. Middle level safety mechanisms such as software interlocks, batons and to/from programming, use of low level redundancy structures;
4. Software coding templates and guidelines for safety critical modules/objects and communications protocols;
5. Low level ‘machine’ issues such as coding of unused memory, hardware watchdogs, use of
hardware memory management units;
6. Guidance as to the use of existing standards (Ada, CANBUS, MIL-STD-1553, POSIX inter
alia) definitions of safe sub-sets etc; and
7. Guidelines as to the evidence required to verify compliance with each layer of the standard and
how to structure this evidence into a coherent safety case.

As an example, both the United States and United Kingdom have moved towards goal based safety standards, MIL-STD-882D and DEF-STAN 00-56 Issue 3 in recent years. However, when invoking such a standard under a fixed firm price arrangement it is difficult for the acquirer to argue post contract award that the scope of the contractor’s program is inadequate. It should be noted that in the US this tendency is offset by the preponderance of ‘cost plus’ style contracts for development programs where the acquirer carries the cost of such scope changes.

Military versus commercial programs
There also exist significant differences between military and civilian program requirements in terms of the type of assurance data and how it is handled (Johnson 1998). These differences are driven by the differences between the ways in which military and commercial programs use such data to develop, certify and acquire a product. Military software acquisition is dominated by the multiyear lifecycle associated with the acquisition of major weapon systems; while commercial programs are typically smaller and dominated by a time to market imperative. Military software also tends to be single customer as opposed to the multiple customers of commercial systems,

BIBLIOGRAPHY
        1       AAP 7001.043 (2005), Technical Airworthiness Design Requirements Manual, Australian Defence Force.
        2       ADA UK (2000): The Contribution of the Ada Language to System Development A Market Survey, available from www.adauk.org.uk.
        3       Adams, E., Optimizing Preventive Service of Software Products, IBM Journal of Research and Development 28(1), pp2-14, January 1984.
        4       Amey, P., Correctness By Construction: Better Can Also Be Cheaper, CrossTalk Magazine, The Journal of Defense Software Engineering, March 2002.
        5       ARIANE 501 Flight 501 Failure, Report by the Inquiry Board, Paris 19 July 1996.
        6       ARP 4754 (1996), Certification Considerations for Highly Integrated or Complex Avionics Systems, Society of Automotive Engineers.
        7       ASME Boiler and Pressure Vessel Code, American Society of Mechanical Engineers, 1998 Edition.
        8       Atchison, B., Wabenhorst, A., A Survey of International Safety Standards, Software Verification Research Centre (SVRC), SVRC Technical Report 99-30, The University of Queensland QLD, Australia, 1999.
        9       Bowen, J. & Stavidrou, V., Safety-Critical Systems, Formal Methods and Standards, In IEE/BCS Software Engineering Journal, Volume 8 No. 4, pp189-209, 1992.
        10      BSD Exhibit 62-41 (1960), System Safety Engineering: Military Specification for the Development of Air Force Ballistic Missiles, USAF Ballistics System Division (BSD).
        11      Martin Croxford , James Sutton, Breaking Through the V and V Bottleneck, Proceedings of the Second International Eurospace - Ada-Europe Symposium on Ada in Europe, p.344-354, October 02-06, 1995

        12      Dawkins, S. K., Kelly, T. P., McDermid, J A., Murdoch, J., Pumfrey, D. J., Issues in the Conduct of PSSA, University of York; York, UK, Proceedings of the 17th International System Safety Conference (ISSC). pp77- 78, 1999, System Safety Society.
        13      Edsger W. Dijkstra, The humble programmer, Communications of the ACM, v.15 n.10, p.859-866, Oct. 1972  [doi>10.1145/355604.361591]

        14      DEF-STAN 00-56 Issue 2 (1996) Safety Management Requirements For Defence Systems., UK Ministry of Defence, Dept of Standardisation.
        15      DEF-STAN 00-56 Issue 3 (2004) Safety Management Requirements For Defence Systems., UK Ministry of Defence, Dept of Standardisation.
        16      Casani, J., JPL D-18709, Report on the Loss of the Mars Polar Lander, Cal Tech, 2000 available from www.jpl.gov/marsreports/marsreports.html
        17      Erich Gamma , Richard Helm , Ralph Johnson , John Vlissides, Design patterns: elements of reusable object-oriented software, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1995

        18      German, A., Software Static Code Analysis Lessons Learned, 2003, Crosstalk Magazine, November 2003.
        19      Greenwell, W.S., Holloway, M., Knight, J.C., A Taxonomy of Fallacies in System Safety Arguments, submitted to DSN-2005, the International Conference on Dependable Systems and Networks, Yokohama Japan, June 2005.
        20      Kimberly S. Hanks , John C. Knight , Elisabeth A. Strunk, Erroneous Requirements: A Linguistic Basis for Their Occurrence and an Approach to Their Reduction, Proceedings of the 26th Annual NASA Goddard Software Engineering Workshop, p.115, November 27-29, 2001

        21      IEC 61508 (1998-2000): Functional Safety of Electrical/Electronic or Programmable Electronic Safety-Related Systems, Volumes 1 to 7, International Electro-technical Commission (IEC).
        22      Jaffe, M., and Leveson, N., Completeness, Robustness, and Safety In Real-Time Software Requirements Specification, Proc. ACM 1989.
        23      Jaffe M., What Is Software Requirements Engineering and Why Is It So Hard?, Course Notes, 2003.
        24      Johnson, L.A., DO-178B, Software Considerations in Airborne Systems and Equipment Certification, Crosstalk Magazine, October 1998.
        25      Kelly, T.P., McDermid, J.A., Safety Case Construction and Reuse using Patterns, in Proceed of the 16th Conference on Computer Safety, Reliability and Security (SAFECOMP 97), Springer Verlag, 1997.
        26      Nancy G. Leveson, Safeware: system safety and computers, ACM Press, New York, NY, 1995

        27      Nancy G. Leveson, Intent Specifications: An Approach to Building Human-Centered Specifications, IEEE Transactions on Software Engineering, v.26 n.1, p.15-35, January 2000  [doi>10.1109/32.825764 ]

        28      Lions, J.L., Ariane 5 Flight 501 Failure: Report by the Inquiry Board, Paris: European Space Agency, 1996.
        29      Bev Littlewood , Lorenzo Strigini, Validation of ultrahigh dependability for software-based systems, Communications of the ACM, v.36 n.11, p.69-80, Nov. 1993  [doi>10.1145/163359.163373]

        30      Robyn R. Lutz, Targeting safety-related errors during software requirements analysis, Proceedings of the 1st ACM SIGSOFT symposium on Foundations of software engineering, p.99-106, December 08-10, 1993, Los Angeles, California, United States

        31      McDermid, J. A, Pumfrey, D.J Software Safety: Why is there no Consensus? Proceedings of the International System Safety Conference (ISSC) 2001, Huntsville, System Safety Society, 2001.
        32      Mackall, D.A., Development and Flight Test Experiences with a Flight-Critical Digital Control System. NASA Technical Paper 2857, NASA, Dryden Flight Research Facility, California, USA, 1988.
        33      MIL-HDBK-244A (1990): Guide to Aircraft/Stores Compatibility, US DoD.
        34      MIL-S-38130A (1966): Safety engineering of Systems and Associated Subsystems and Equipment, general Requirements for, US DoD.
        35      MIL-STD-882B (1984): System Safety Program Requirements, US DoD.
        36      MIL-STD-882C (1993): System Safety Program Requirements, US DoD.
        37      NATO STANAG 4404 (1996): Safety Design Requirements and Guidelines for Munition Related Safety Critical Computing Systems, 1st Edition, North Atlantic Treaty Organization (NATO).
        38      NM 87117-5670: USAF System Safety Management Handbook.
        39      NSS 1740.13, (1994): NASA Software Safety Standard, Interim Release, NASA.
        40      CORPORATE Oxford University Press, Dictionary of computing (3rd ed.), Oxford University Press, Inc., New York, NY, 1990

        41      David L. Parnas , A. John van Schouwen , Shu Po Kwan, Evaluation of safety-critical software, Communications of the ACM, v.33 n.6, p.636-648, June 1990  [doi>10.1145/78973.78974]

        42      Perrow, C., Normal Accidents: Living with High-Risk Technologies, 1984.
        43      Rasmussen, D.M., and Vesely, W.E, Uncertainties in nuclear probabilistic risk analyses, Risk Analysis, 1984.
        44      RTCA/DO-178 (1983), Software Considerations in Airborne Systems and Equipment Certification, RTCA, Inc.
        45      SAE/ARP4761 (1996): Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems, Society of Automotive Engineers (SAE).
        46      USAF System Safety Handbook for Acquisition Managers, USAF Space Division, January 1984.

-----Original Message-----
From: systemsafety-bounces at techfak.uni-bielefeld.de [mailto:systemsafety-bounces at techfak.uni-bielefeld.de] On Behalf Of Peter Bernard Ladkin
Sent: Sunday, July 07, 2013 7:40 AM
To: systemsafety at techfak.uni-bielefeld.de
Subject: [SystemSafety] Proposed rewrite of IEC 61508 "proven in use" assessment conditions for SW

Folks,

the German national committee tasked with IEC 61508 Part 3 (SW) matters has been working for some
time on developing the assessment requirements for SW elements to be considered adequately "proven
in use". (Please note that the term "proven in use" is a technical term in IEC 61508; one may query
whether it is appropriate - I think it is appropriate - but for current purposes I suggest we just
accept it.)

On 17 June I started a thread entitled "Qualifying SW as "proven in use"" and referred to a white
paper I wrote at
http://www.rvs.uni-bielefeld.de/publications/WhitePaper/LadkinPiUessay20130614.pdf
That white paper had two parts: one detailed via a hypothetical example the problems one might have
if the assessment requirements are too lax (specifically, the problems that arise with the current
assessment conditions in IEC 61508-3:2010); the second suggested an approach to assessment via
Markov processes (which could be extending, maybe, to Bayesian Belief Networks, if one has some
information about the internal architecture of the SW - grey box rather than black box).

I had originally tried to approach the issue of modelling how SW behaves by suggesting that it
behaves as a (arbitrarily complicated) finite-state machine (FSM), but that approach foundered in
two ways:
(1) there is inherent non-determinism in (a) the use of source-code languages which do not have a
demonstrably unambiguous semantics; (b) in the use of many compilers (especially those which
"optimise"); (c) maybe in the linkers; (d) maybe in the realisation of the opcode instructions in
HW; and
(2) there are no mature statistical techniques for determining to a given degree of confidence
whether exhibited behavior is that of an FSM.

For Point (2) I am *very* grateful for numerous discussions with Bev Littlewood. Bev also suggested
that the Markov-process approach might be a way to accommodate Point (1); hence the suggestion in my
white paper referenced above.

Members of the IEC Maintenance Team for the 61508 SW part who are interested in the "proven in use"
assessment conditions met in Frankfurt on 29 April. The Chair, Audrey Canning, asked the German
members at the meeting to prepare a proposal for replacement of the "proven in use" conditions by
some we consider more apt. The ultimate goal is formally a "Technical Specification", which is an
IEC publication, and possible incorporation into the next edition of IEC 61508-3, which is
provisionally scheduled for 2016 (after the formal two-year maintenance action, which is anticipated
to start in 2014). The German committee (rather, the subcommittee tasked with SW matters) finished
its proposal on 4 July and there is now a text which we would like to offer for general commentary
to experts who are not necessarily on the IEC 61508-SW Maintenance Team and who are not necessarily
involved with 61508 standardisation committees at all.

The text consists of a series of clauses in IEC-standards format, and is about three pages long. We
have made a serious attempt to include explicitly the conditions under which the future
failure-behavior behavior/frequency of SW can be inferred with some given degree of confidence from
past failure behavior, as explained in detail to us over the last four years by Bev Littlewood.
Basically, that which is necessary to ensure that the relevant statistical properties of the future
proposed use are identical to those of the recorded past use (one of which is, of course, that the
recording is veridical!).

(Note: I specifically use the term "failure behavior" of SW to indicate that it is the behavior of
running SW which is being talked about, not the static pattern which is source code or object code,
and to avoid the trope that that static pattern is not capable of failure in the normal engineering
sense, since failure is a behavior which a static pattern ipso facto cannot have.)

The text will eventually become public (we discussed how it should appear on the DKE WWW site). We
would like general commentary, but we also have to figure out how to mutate general commentary into
something which fits on the formal IEC comment form. So at this point, rather than distribute it
generally as an attachment to a message here, we would like to distribute it to those people who
explicitly express an intent to read it and comment.

I would like to invite people here to send me a short e-mail note (private, please, to avoid
"spamming" the list) expressing an intent to read the short proposed "proven in use" clauses and
comment. Comment can be of any form, including general messages to this list, but I would reserve
the right to come back to you with a request to shoehorn your points into the formal IEC format
(caveat: this can be far more annoying than it might first appear :-) ).

Again, many thanks to Bev for his substantial support. Any mistakes are ours, not his. Indeed, he
might be hard put to recognise anything he said in what we've written :-)

Next task is to revise Part 7 Annex D. I'll keep this list advised on that as well. The moral drawn
from our discussions so far is that there is both more and less to qualifying pre-existing SW for
new future use in a safety-related application than ensuring that the statistical properties in the
future use are, to some specified degree of confidence, identical to those determined in the past.

PBL

--
Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de

_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE