[SystemSafety] Component Reliability and System Safety [was: New Paper on MISRA C]

Fri Sep 14 21:24:57 CEST 2018

>> 
Another way to allow such development to be more dependable than it otherwise would be is to switch
to SPARK and then use a back-end C generator if your code has to be in C.
<<

A more direct way would be to use a tool such as Escher C Verifier to give the same guarantees as SPARK about the correctness with respect to requirements of components written in C, or preferably a subset of C++ instead. See my conference paper "Can C++ be made as safe as SPARK". Sadly, the number of C and C++ developers interested in proving their code correct is close to zero.

On 14 September 2018 08:03:25 BST, Peter Bernard Ladkin <ladkin at causalis.com> wrote:
>Paul has raised the issue of component reliability, for example, the
>dependability that may accrue
>to programs to the MISRA C coding standard and checked with static
>analysis tools aimed at that
>standard, and what that might have to do with system safety,
>referencing Nancy Leveson's book
>Engineering a Safer World.
>
>There is quite a history of discussion on this list and its York
>predecessor about STAMP/STPA and
>other analysis methods. Here is something from 15 years ago (July 2003)
>comparing STAMP (as it then
>was) with WBA in the analysis of a railway derailment accident
>http://ifev.rz.tu-bs.de/Bieleschweig/pdfB2/WBA_STAMP_Vergl.pdf  (This
>was at the second Bieleschweig
>Workshop. The list of Bieleschweig Workshops and slides and documents
>from most of the contributions
>may be found at https://rvs-bi.de/Bieleschweig/ )
>
>What the authors found was that many of the supposedly "causal factors"
>identified through the STAMP
>analysis were not what they and others wished to identify as causal
>factors. For example, the German
>railways DB had and have a hierarchical command structure in which
>information and requirements
>flows from "above" to "below" and often lack the feedback loops from
>"below" to "above" which STAMP
>at the time required. The lack of loops was, at the time, classified by
>STAMP as "causal factor(s)"
>according to Nancy's "new" conception of causality.
>
>Except that the DB could have said, with some justification, "we have
>been running a railway like
>this for 150 years and accidents of this sort are few". And it is not
>clear that modifying the
>organisational structure to implement those feedback loops would be
>advisable. It would be a change
>of fundamental culture, and not all such changes have the desired
>effect. I was involved in one such
>translation of Anglo organisational culture into German organisational
>culture, with the
>introduction of university Bachelor's/Master's degree programs to
>replace the traditional German
>Diplom programs. The result has been very mixed. I think the main
>benefit is that it makes
>transference of course credits across international boundaries easier,
>but lots of other things of
>local (Uni BI) importance are worse than they were.
>
>The point I wish to make with this example is that STAMP (as it then
>was) embedded a conception of
>organisational culture that was and is by no means universal. And such
>an embedding is going to
>mislead analysts when the culture being investigated does not match the
>preconception. As our Brühl
>analysts found out.
>
>I prefer, and have followed, the alternative approach of developing
>tools to target certain tasks in
>safety analysis. WBA, OHA, OPRA, Why-Because Graphs, Causal Fault
>Graphs. Nancy may wish to claim,
>as Paul quotes, that
>> "Accidents are complex processes involving the entire socio-technical
>system. Traditional event-chain models cannot describe this process
>adequately.
>
>(I might counter that weird concepts of causality cannot describe
>causal factors adequately, citing
>the Brühl analysis. It would be an equally trivial intervention. So I
>won't.)
>
>She once described WBA as a "chain of events model" in order to
>conclude it couldn't describe
>necessary components of an accident. I responded that the only
>appropriate word in that description
>was "of" - WBGs are almost never a chain; almost never include only
>events, and are not models but
>descriptions (showing states and events and processes and the relation
>of causality between them).
>
>I have also not seen any STAMP example which could not have arisen in
>similar form through WBA
>performed by an analyst sensitive to sociotechnical features. What
>STAMP brings, I would suggest, is
>a certain structure to the socio/organisational/governmental/legal
>aspects of accident and hazard
>analysis which WBA & OHA does not explicitly include. There is value in
>that, of course, although
>one must beware, as above, of implicit assumptions which do not match
>the analysis task at hand.
>
>You could also use WBA, in particular the Counterfactual Test, as part
>of STAMP if you wish. Nancy
>was never receptive to that idea - WBGs were "chain of events models"
>and she "knew" that couldn't work.
>
>The original STAMP idea was prompted by Jens Rasmussen's Accimaps, I
>understand. Andrew Hopkins used
>these quite successfully in various studies of accidents in Australia.
>Hopkin's analyses neatly
>stratified the causal factors in a similar way to that in which STAMP
>does, but were typically far
>less detailed than a WBG of the same event. The stopping rule and the
>level of abstraction (and the
>abstraction processes used by the analyst) are decisive components of
>any analysis. (We deal with
>the abstraction level in OPRA analysis; the process of deriving a
>stopping rule is less well
>researched.)
>
>The thing inhibiting almost any sophisticated analysis of the STAMP or
>WBA variety is that there are
>a lot of system-failure incidents in which the
>sociological/organisational components become clear
>but people have a vested interest in ignoring them. We have had people
>stop using WBA because it
>highlighted too much that they couldn't change. STAMP would encounter a
>similar issue. For such
>applications, one can imagine an "organisation-relative" WBA. An
>"organisation-relative" STAMP would
>negate its purpose, I imagine.
>
>An example. Sometime over 15 years ago, one of the systems on our RVS
>network, then a physical
>subnetwork of the Uni Bielefeld campus network, was penetrated.
>Somebody working late at night
>noticed a SysAdmin on the system and wanted to chat. Instead, odd
>things started to happen and the
>SysAdmin went away. Telephone calls were made. It was an impostor, who
>in retreating had trashed 3GB
>of log files and other material. My guys eventually analysed much of
>the trashed material in about a
>person-month of forensics and figured out who the impostor was.
>
>The day after the evening incident, the guys had called me. The main
>question for me was how the
>impostor had obtained SysAdmin account credentials. The (true) SysAdmin
>often worked from home. He
>came in over the telephone lines to the Uni network and pursued a login
>process over the Uni
>Ethernet. Chances anyone was listening to the telephone data transfer
>were minimal; it must have
>been listened to within the Uni Ethernet by someone who knew weaknesses
>in the login process code
>and was able to read his login name and password. At the time we had, I
>thought, a rely-guarantee
>arrangement with the Uni network admin that they could give us a
>network infrastructure which
>excluded non-authorised users. Our network security policy was
>explicitly based on this
>rely-guarantee arrangement. I called up their administrator on a
>Saturday morning: "your security
>has been breached; we found out last night". The response "oh, well, we
>can't do everything. We'll
>look at stuff next week, I guess. You should take better precautions."
>
>I hope it is obvious that a breach of a rely-guarantee arrangement was
>a causal factor in the
>incident. We thought that arrangement existed, and our colleague in the
>Uni SysAdmin conveniently
>forgot it. It is hard to see that a STAMP analysis would have changed
>his mind.
>
>Of course, nowadays it would be daft to think any normal network
>administration could provide such a
>secure communication environment.
>
>On to some comments of Clayton:
>
>On 2018-09-13 20:22 , clayton at veriloud.com wrote:
>>  Component reliability, where bugs are found, fixed, and test cases
>are passed, do not account for the insidious failures that occur in the
>real world where "component interaction" can be so complex, no amount
>of testing can anticipate it. 
>
>There are a bunch of things mixed in here. Let me take one.
>
>>  Component reliability do[es] not account for the insidious failures
>that occur in the real world where "component interaction" can be so
>complex .......
>
>One of progressively the most safe complex technologies, civil air
>transport, has been built on the
>principle (and the regulation) of rigorously-pursued component
>reliability and still is.
>
>Let me take another:
>
>>  [Procedures] where bugs are found, fixed, and test cases are passed,
>do not account for the
>insidious failures that occur in the real world where .... no amount of
>testing can anticipate it.
>
>That is a basic statistical observation that is, or should be, part of
>any system safety engineering
>course. The usual references are Butler & Finelli, IEEE TSE 1993 (for a
>frequentist interpretation)
>and Littlewood and Strigini, CACM 1993 (for a Bayesian interpretation).
>If you are a
>frequentist/Laplacian, it all follows from simple arithmetical
>observations concerning the
>exponential distribution.
>
>> .. As Leveson and many have stated before, most failures arise of out
>system requirements flaws due to lack of rigor (e.g. poor hazard
>analysis and mitigation). 
>
>It indeed seems to be the case that most mission failures of *carefully
>developed* systems arise out
>of some mismatch between requirements and actual operating conditions
>(which I like to trace back to
>Robyn Lutz's study for NASA, because her 1993 paper is still on-line
>and easily accessible). It is
>also likely true (if not a truism) that most failures arise from flaws
>due to a lack of rigour
>somewhere. But I don't know of any reliable study that has shown that
>most system failures arise out
>of a *lack of rigour in system requirements*. Almost none of the
>well-known cybersecurity incidents
>such as Stuxnet, Triton, TriSis, Go To Fail, Heartbleed, NotPetya arose
>out of a lack of rigour in
>system requirements, as far as we know. I would venture to suggest that
>is true of most of the
>advisories coming out of ICS-CERT; certainly the two significant ones
>of which I was advised this
>week (reading between the lines of one of those, it seems a major
>reliable industrial communication
>system had a protocol susceptible to deadlock).
>
>> In this email thread, its interesting nobody seems to be commenting
>on the subject’s paper, have you read it? 
>
>Um, yes.
>
>> It is about .....
>
>My reading is that it is about why people use C for programming small
>embedded systems and why
>following the MISRA C coding standard may allow such development to be
>more dependable than it
>otherwise would be.
>
>Another way to allow such development to be more dependable than it
>otherwise would be is to switch
>to SPARK and then use a back-end C generator if your code has to be in
>C.
>
>Or develop in SCADE and use its C code generator.
>
>Or .........
>
>>> [Paul Sherwood, I think] Why is MISRA C still considered relevant to
>system safety in 2018?
>
>(Banal question? Banal answer!) Because many people use C for
>programming small embedded systems and
>adhering to MISRA C coding guidelines enables the use of static
>analysis tools which go some way
>(but not all the way) to showing that the code does what you have said
>you want it to do.
>
>PBL
>
>Prof. Peter Bernard Ladkin, Bielefeld, Germany
>MoreInCommon
>Je suis Charlie
>Tel+msg +49 (0)521 880 7319  www.rvs-bi.de

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20180914/b756ef76/attachment-0001.html>