[SystemSafety] Stupid Software Errors [was: Overflow......]

Mon May 4 15:42:12 CEST 2015

Peter,

> I don't buy Derek Jones's or Tom Ferrell's versions of the curate's egg. I don't see why anyone
> else should, either. Are they still going to be saying "well, it depends, it's complicated" in

Blustering in the face of reality is unbecoming.

Very thorough static analysis requires:

    o lots of memory.  I have written about how optimizing compilers took
off in the 1990s because developers finally had computers containing
enough memory for high level optimization techniques to be used:
http://shape-of-code.coding-guidelines.com/2011/10/08/memory-capacity-and-commercial-compiler-development/
Static analysis requires even more memory.  Cloud providers now offer
machines containing a quarter of a terrabyte of main memory; that
should be enough.

    o lots of cpu power.  Again the cloud now provides this, but some
work is needed to figure out how to parallelize what to data are
single threaded solutions.

    o willingness to put up with lots of false positives.  These
customers are easy to spot because the Unicorns they rid to work on
are corralled in the car park.

    o the commercial incentive to make it happen.  My experience is
that most developers are more interested being able to change the
colours on the user interface that what static analysis tools do.
Which is why after initial development most commercial static
analysis R&D goes on the IDE.

So we are now in the position that the computational resource
problem is solved.  I don't think the last two problems will be
solved until we start sending developers and their managers to
prison for delivering code containing faults that could have been
detected.

> another twenty years when stupid coding errors still make it through into supposedly-dependable
> software products?
>
> Look at go-to fail. That's critical code! How come critical code such as that is not routinely
> subject to static analysis?
>
> Look at the 787 generator code. A systematic loss of all generators is surely a hazardous event.
> That should make it 10^(-7). Oh, but I forgot. Even though correct operation of SW contributes to
> the 10^(-7), the reliability of the SW itself is not assessed. But surely it gets to be at least
> DAL B, since the result is a hazardous event? Oh, but I forgot something else. A systematic
> failure like that would be common cause, and the certification requirements concern single
> failures, not common cause failures. So that's all right then. Tom's suggestion that it might have
> been a design compromise is vitiated by the fact that the phenomenon is subject to an
> AIRWORTHINESS Directive by the FAA. (Is that sufficient emphasis?)
>
> If people had told me thirty years ago that we'd still be making the same stupid mistakes in the
> same ways, but this time in code more fundamental to the safe or secure operation of everyday
> engineered objects, I wouldn't have believed it.
>
> Maybe it's a social thing. Mostly, people actually writing the code and inspecting it are in their
> twenties and their bosses maybe at most in their early thirties. The young people have never made
> *this* mistake before - the previous lot had of course, but they're all in management now. I'm
> reminded of Philip Larkin's ode to rediscovery, Annus Mirabilis:
>
> Sexual intercourse began
> In nineteen sixty-three
> (Which was rather late for me)-
> Between the end of the Chatterley ban
> And the Beatles' first LP.
>
> The Ensuing Discussion.
>
> There was obviously discussion on the list of why we are making the same old mistakes forty years
> after it was known how to avoid them. Some discussants suggested it might help to professionally
> certify software engineers, a PE. Others referred to the Knight-Leveson study a decade ago for the
> ACM, in which inserting SE into the current PE scheme was not seen as advantageous. UK discussants
> pointed out that such certification exists in the UK, as a CEng through the BCS or IET, and that
> there had been some UK consideration of extra qualification for critical-software engineering.
>
> Such qualification for system safety hasn't (yet) generally caught on anywhere. SARS offer it in
> the UK for example. It didn't catch on in the US. Over a decade ago, the System Safety Society
> introduced an option for system safety engineering into the PE exam. They had to pay the NPSE or
> NCEES (I forget which) lots of money per year to maintain the option - and two people took it in
> some number of years. So they dropped it. (I was at the board meeting in Ottawa in 2004 when this
> was decided.)
>
> The UK qualification regime hasn't stopped IT disasters in government procurement. And it hasn't
> stopped the kind of poor engineering which allows bank ATMs which use supposedly
> pseudo-one-time-pad nonce generation to be subject to replay attacks (see a recent paper reciting
> local experiments performed by Ross Anderson's group). I do note, however, that the three examples
> I mentioned above are all US examples. It's not ruled out that having some degree of formal
> professional training, as in the UK, encourages software engineers to avoid repeating simple
> mistakes whose prophylaxis has been well known for decades.
>
> Time was, when UK and US cars were not known for their reliability. Kind of like SW,
> relatively-inexpensive cars used to go wrong a lot. However, some very expensive cars such as made
> by Rolls-Royce/Bentley and Wolseley were reliable. So there was proof of concept. Japanese
> companies decided it was possible to produce reliable relatively-inexpensive cars and make money,
> and did it.
>
> There is proof of concept in SE, too. Unlike Rolls-Royce cars, it is not prohibitively expensive.
> Three out of my four examples involve run-time error. It is feasible to produce SW
> cost-effectively which is free from run-time error. Just like the Japanese approach to cars, you
> just have to decide to do it.
>
> How about the following? We design a document called A Programmer's Pledge. It has thirty or so
> numbered clauses:
>
> * I promise never to deliver SW which is subject to a data-range roll-over phenomenon (especially
> dates and times)
>
> * I promise never to deliver software which is subject to a numerical overflow or underflow exception
>
> * I promise never to deliver software which reads data on which it raises an "out of range" exception
>
> * ..... and so on
>
> A professional programmer signs it and files it with hisher professional organisation. Quality
> control issues in programs (such as the above phenomena) are routinely subject to RCA of sorts.
> When a programmer is responsible for a piece of code with such an error in it, the company reports
> it to the professional organisation and the programmer gets "points" attached to the corresponding
> clause in hisher Pledge. Like with driving (Germans say "points in Flensburg" which is where the
> office is. What is it in the UK? "Points in Cardiff"?). I bet lots of organisations, from
> companies hiring programmers to professional-insurance companies will find uses for it.
>
> PBL
>
> Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
> Je suis Charlie
> Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
>
> iQEcBAEBCAAGBQJVRxS0AAoJEIZIHiXiz9k+Sv4H/3qSuiODGIZarIb0Rwj4PoOR
> gi6zvdAb1ns2A8w0xXiBz6E8+iwik53ueVxhEDTINA4RXyoLTfFEVl9yunOR0qnU
> 7ht92kguaSjuM3BGUGYzy8MpZMjc0jyNWRmyC3wh0y3X0NnjL+/GMiqYR+3zq5RX
> ZEzJk89SboZiB1kyTqMM+IcKzbABmk1CSaAkQziGvdJFWklNM10prMIk/5MprGwV
> EeePB1rGs13Z1LZi8GIqdz8PDc1FKSz5qRugQ8VZJbbJvgct9JJVfEtQx3uElGkt
> a/E5fQ/+Gw8CARMhpktEr/wLdk7t3akJvNF5iLK5W7Mbb3h0kd7sCNLZ5d9OZyA=
> =i/nm
> -----END PGP SIGNATURE-----
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
> .
>

-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com