[SystemSafety] The bomb again

Thu Oct 10 12:39:08 CEST 2013

I have been traveling and unable to comment (just too much to type on an
iPhone). So I am going to answer Peter's second message first because it is
easier. I'll answer the first one about STAMP later.

I don't know where Peter got the impression that I ever formally verified
TCAS. *I did not*!!!  Peter, please show me the paper where I did that.
What I did was to help the FAA and the aviation industry committees working
on certifying TCAS create a requirements specification for TCAS. (Later, I
used TCAS as an example for intent specifications to show that they were
practical.) At no time did I ever do any formal analysis on TCAS.
Determining whether it is safe or not was not my goal -- I was simply
documenting what TCAS did as the FAA (rightly) refused to certify it on the
basis of thousands of lines of pseudocode as the only documentation.  As I
remember, Lincoln Labs has worked for at least 30 years doing trajectory
analysis on various versions of TCAS/ACAS starting in the early 80s. I do
vaguely remember a paper by Nancy Lynch, but it suffered the same problems
of the others in the formal methods community who claimed that they had
formally verified TCAS (and other complex systems). These people took my
formal specification (which most of the formal methods community has
dismissed as not formal enough to be a formal specification -- a criticism
I don't want to waste my time on) and tried to do formal analysis on TCAS.
The ones that I have seen only did a very small part of the system I
specified (that involving some simple logic in which no errors have ever
been found) because their methods could not handle most of the logic,
including the most critical and complex parts, which are not discrete and
requires continuous math.

Peter also wrote:
(*That also helps explain why STAMP wasn't really helpful in that one
example I gave; to follow its recommendations would mean having to change a
half-dozen laws because of one accident which luckily didn't hurt anyone
too badly. First, that's just not going to happen. Second, the legal
justification for changing a law because of an accident would be the
Counterfactual Test, and so a procedural (and therefore legal) change which
STAMP recommends but which does not pass the CT just couldn't be
implemented. There were quite a few of those. It's not really surprising
that people working in such an environment would prefer an analysis method
which imposes the CT.)*
*
*
I work in engineering, not law. Unfortunately, lots of law has fallen
behind our quickly moving ability to engineer very complex and
technologically advanced systems. To argue that the engineering should stop
because the law has not been updated is unrealistic. The law is going to
have to catch up because people are not going to compromise on what they
build and, in fact, have not done so. Also, the argument that an incident
that did not hurt anyone should be ignored also seems unwise to me. There
are almost always precursors to major accidents where nobody was killed --
the conditions just did not lead to a large number of deaths in those
instances. Bhopal and the DC-10 accidents are the most obvious. Not taking
steps after these precursors occur only means that one is ignoring accident
scenarios that have been shown to be possible. To ignore precursors because
it might involve changing the law (and that is inconvenient for
bureaucrats) seems particularly unwise.

I'm not following Peter's argument about not changing the law because the
scenario *of a real event* does not follow the Counterfactual Test. It
would seem to me that the event occurring shows that the CT is not adequate
in today's engineering world.

Nancy

On Thu, Oct 10, 2013 at 4:58 AM, Peter Bernard Ladkin <
ladkin at rvs.uni-bielefeld.de> wrote:

> Matthew,
>
> I am not going to indulge in a detailed discussion of STAMP, because I
> don't use it, and there are others here whose experience with it is far
> richer than mine.
>
> On 10 Oct 2013, at 02:59, Matthew Squair <mattsquair at gmail.com> wrote:
> > Nor was I trying to make a case for STAMP as a methodology for accident
> analysis, though clearly you can use it for that.
>
> STAMP was originally presented as a "new accident model".
>
> > To rephrase my question a little, if you accept the thesis that various
> sociologists make that you can't divorce the safety of a 'hard system' from
> it's organisational context then surely that implies that you should expend
> some effort on designing the containing organisational system, or at least
> critically reviewing it?
>
> I would have thought so.
>
> I have specific experience and expertise that do not extend to designing
> organisational structures (no matter how much I might mouth off about them
> here). Professional code of conduct says I get someone who is expert on
> those to do that. A clever organisational theorist with relevant experience.
>
> Which isn't to say that the results can't appear in engineering analyses.
> For example, I find Nancy's STAMP analysis of the 1993 Iraq friendly-fire
> shootdown insightful as well as way shorter and more precise than that of
> the organisational theorist Snook. Snook makes some obvious mistakes, for
> example he claims as a basis for his analysis that applying the
> Counterfactual Test is insufficient. (First, his attempt at applying the CT
> is full of mistakes - see a paper of ours from 2003. Second, all of his
> analysis can be represented in a WBG, which of course *only* uses the CT as
> its criterion.) But most of what he observes in his book is insightful and
> pertinent. Amongst other organisational-behavioral (OB) phenomena, he makes
> good use of Rasmussen's 1997 notion of Migration to the Boundary, and the
> notion of crowd indecision, the "Kitty Genovese" phenomenon, as well as
> information-flow analyses which one could routinely consider to be
> engineering phenomena. I wonder whether either the STAMP analysis or the
> WBA would be there without his having appeared first as a point of
> reference? That is, it has to occur to an analyst that MttB and the
> Genovese phenomenon are present and explanatory; it doesn't just drop out
> of some magical methodological back pocket. (Whereas I think the info-flow
> observations do.) Engineers not specially trained in OB can't be expected
> to know about or observe instances of MttB or Genovese. Both the military
> investigation and the Commission missed them, which was part of the
> motivation for Snook, as I understand.
>
> > And if you do so, then what are the system design and analysis tools and
> methodologies that can support you?
>
> I thought I had already answered that question. Let me answer it in a
> little more detail.
>
> As you suggest above, I do get to analyse the resulting proposals for
> sociotechnical system design, being a systems safety expert. And for that I
> use OHA plus the techniques I already mentioned; CCFD, WBA-on-CCFD, PARDIA
> and RCM analyses (I guess I hadn't mentioned the CCFD-involved techniques).
> Those won't derive Perrin's insight either, but, as I said, I would
> explicitly rely on the organisational theorists for such observations.
>
> None of the engineering analyses performed upon TCAS identified the
> decision-theoretic weaknesses of the sociotechnical algorithm which RCM
> identifies. There is also at least one flight configuration which I
> identified which no one has shown that TCAS II/ACAS II can resolve (it's a
> math problem in trajectories, and, having discussed it with some
> outstanding mathematicians expert in the field I made the public offer in
> 2002 of a PhD for a resolution of that issue. Eleven years later, it's
> still open).
>
> Yet the algorithm was "formally verified" by two teams led by two
> outstanding computer scientists, Nancy Leveson and Nancy Lynch. Automated
> proof checkers and the lot.
>
> Those analyses turned out to be quite influential. I have continued to
> meet engineers who ignore the results of the RCM analysis, dismiss the
> unresolved configuration issue, and explain how TCAS is quite satisfactory
> as it is by repeating the usual trope (the "philosophy") and that it was
> "formally verified". (The people at Eurocontrol involved in the design of
> the new ACAS X are well aware of the matters above, so they might well get
> addressed in the new system definition.)
>
> Now, how can all that happen? Part of the answer lies, unsurprisingly, in
> the assumptions  imposed upon the Leveson and Lynch projects from outside
> (for example, to analyse just two-aircraft configurations). Those
> assumptions were not explicitly discussed anywhere of which I know during
> the TCAS project analyses, although I think people were quite aware of them
> (I am open to being corrected on this, if people know of such analyses).
> Even after incidents happened before Überlingen, the (what I take to be)
> necessary analyses were not performed.
>
> (And then there was the requirements failure which manifested itself in
> the Überlingen accident, which I mentioned in my comment a few weeks later.
> It turns out Eurocontrol, and indeed RTCA, knew about this, and Eurocontrol
> already had a Change Proposal filed for two years before the accident.)
>
> Organisational theorists can explain all these social phenomena in some
> detail. They haven't yet, but I am sure some tenacious PhD student of
> someone like Pinch, Downer or Slayton is going to do so sometime.
>
> I am sceptical whether analyses performed by engineers, using STAMP or OHA
> or whatever, are going to have much impact on these more general phenomena.
> What the engineers do is very controlled and compartmentalised, and the
> results then summarised in two words and broadcast without the caveats.
>  Such as "formally verified" - it drives me nuts when people utter these
> words without context.
>
> The first time somebody in Germany has a real accident involving an
> electric car being charged, then somebody is going to turn around and say
> "the charging infrastructure was analysed and verified using OHA", followed
> by "somebody at the pointy end screwed up". (That second will come, as it
> always does, from the state prosecutor's office.) Is there any engineering
> method which predicts that behavior? No. Not STAMP, not OHA, not RCM, not
> any. It comes from experience and observation of the organisations and
> social structures involved, and I hope some organisational empiricist has
> noted it down and is working on it. Because I think it needs to be explicit
> in system development that things work that way.
>
> Did I say the first time? I meant the second. The first happened already -
> some home for mentally-challenged people in Southern Germany burnt up last
> year because of a fault arising when a utility vehicle was being charged
> overnight. Luckily - or miraculously - no one was hurt. Damage was (only) a
> half-million euros. I don't know yet what the investigation showed; I think
> it is sub judice and might remain so. I'd like to know, to see if we have
> it covered in the HazAn. But the social mechanisms are not in place to have
> such results shared with the relevant committee of the engineering
> organisation responsible for standards. There: yet another phenomenon for
> the organisational theorists to get their teeth into. You don't solve it by
> having an engineer impose a social model which not everybody buys, and then
> say "look, see, there's a feedback loop missing!" It's true in this case,
> there is a feedback loop missing and it needs to be put in. But if it is
> going to happen, it is going to happen by having some organisational
> theorist make observations, analyse, and relate the phenomenon convincingly
> to other similar social phenomena where things went wrong, and query
> whether we want things to go wrong in the same way here too.
>
> You know, I could go on and on. I bet people are worried that I will! So
> I'll shut up.
>
> BTW, I heard that Rebecca Slayton's book on anti-missile missile systems
> (Star Wars and Patriot and so on) is out. I've seen a couple of chapters
> and was gripped. Let me recommend it!
>
>
> > Interesting you should mention the hierarchical nature of DB, Alfred
> Chandler makes the point in The Visible Hand that the technology of the
> railways drove a specific centralised, hierarchical organisational form,
> which I imagine DB adheres to.
>
> Yes and no. It is now a bunch of nominally-independent organisations
> contracting with each other. Planning-and-booking, catering, stations,
> network infrastructure, maintenance, regional rail, long-distance rail,
> freight. All notionally separate companies. All owned by the government
> still, though, and all overseen by the same board of governors.
>
> (Some changes are obvious. Local trains used to wait for long-distance
> connections. Now they don't, because it ruins their timeliness statistics.
> Which means loads of people missing their connections where they didn't
> before, but the individual timeliness statistics of the individual
> companies continue to improve. DB claims its long-distance trains have a
> 97% "on-time" record. But my record of completing a long-distance journey
> with the planned connections hovers around two-thirds, even though I almost
> never go with a connection-time of under 10 minutes, which is outside the
> "on-time" area of DB statistics. If I include bus/tram connections it's
> about 50%. We need an organisational theorist to collect the data and point
> out in public that the wrong thing is being measured.)
>
> And then there is the railway law. The railway law imposes certain
> procedures which often have a hierarchical form because, well, the law is
> about who is responsible for what and that imposes hierarchy. (That also
> helps explain why STAMP wasn't really helpful in that one example I gave;
> to follow its recommendations would mean having to change a half-dozen laws
> because of one accident which luckily didn't hurt anyone too badly. First,
> that's just not going to happen. Second, the legal justification for
> changing a law because of an accident would be the Counterfactual Test, and
> so a procedural (and therefore legal) change which STAMP recommends but
> which does not pass the CT just couldn't be implemented. There were quite a
> few of those. It's not really surprising that people working in such an
> environment would prefer an analysis method which imposes the CT.)
>
> I thought I said above that I'd stop? Now, I really will.
>
> PBL
>
> Prof. Peter Bernard Ladkin, University of Bielefeld and Causalis Limited
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
>

-- 
Prof. Nancy Leveson
Aeronautics and Astronautics and Engineering Systems
MIT, Room 33-334
77 Massachusetts Ave.
Cambridge, MA 02142

Telephone: 617-258-0505
Email: leveson at mit.edu
URL: http://sunnyday.mit.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20131010/36505848/attachment-0001.html>