[SystemSafety] Stupid Software Errors [was: Overflow......]

Steve Tockey Steve.Tockey at construx.com
Mon May 4 23:02:57 CEST 2015


PBL wrote:
"How about the following? We design a document called A Programmer's
Pledge. It has thirty or so
numbered clauses:

* I promise never to deliver SW which is subject to a data-range roll-over
phenomenon (especially
dates and times)

* I promise never to deliver software which is subject to a numerical
overflow or underflow exception

* I promise never to deliver software which reads data on which it raises
an "out of range" exception

* ..... and so on"


I propose that only one clause is necessary:

* I promise to be personally liable for all damage caused by any software
defect I produce


IMHO, that one clause would be enough to take care of everything else
combined.


-- steve





-----Original Message-----
From: Peter Bernard Ladkin <ladkin at rvs.uni-bielefeld.de>
Date: Sunday, May 3, 2015 11:41 PM
To: The System Safety List <systemsafety at techfak.uni-bielefeld.de>
Subject: [SystemSafety]  Stupid Software Errors [was: Overflow......]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I wrote a version of the following a few days ago to a closed list.

AA has EFBs crashing on a number of flights. Apparently two copies of the
approach chart for
Reagan Washington National airport were included in after the latest
update of the EFB, and the
app wasn't able to handle having two files with almost-identical metadata
denoted as "favorites".
A colleague who flies for a major airline (not AA) which uses EFBs spoke
of some colleagues having
their EFBs crash early on Jan 1 one year - they fixed it by rolling the
date back a day.

On the Boeing 787: think of 32-bit Unix clock, and lots of examples.
There's even a Wikipedia page
http://en.wikipedia.org/wiki/Time_formatting_and_storage_bugs .

Remember Apple's go-to fail (CVE-2014-1266) from 2014: missing parsing
checks.

These are simple, known types of error. Forty years ago, it was known how
to avoid all these kinds
of problems. Twenty years ago, there were industrial-quality engineering
tools available (proper
languages and coding standards checkers) which enabled companies to avoid
such problems without
undue development costs.

I don't buy Derek Jones's or Tom Ferrell's versions of the curate's egg. I
don't see why anyone
else should, either. Are they still going to be saying "well, it depends,
it's complicated" in
another twenty years when stupid coding errors still make it through into
supposedly-dependable
software products?

Look at go-to fail. That's critical code! How come critical code such as
that is not routinely
subject to static analysis?

Look at the 787 generator code. A systematic loss of all generators is
surely a hazardous event.
That should make it 10^(-7). Oh, but I forgot. Even though correct
operation of SW contributes to
the 10^(-7), the reliability of the SW itself is not assessed. But surely
it gets to be at least
DAL B, since the result is a hazardous event? Oh, but I forgot something
else. A systematic
failure like that would be common cause, and the certification
requirements concern single
failures, not common cause failures. So that's all right then. Tom's
suggestion that it might have
been a design compromise is vitiated by the fact that the phenomenon is
subject to an
AIRWORTHINESS Directive by the FAA. (Is that sufficient emphasis?)

If people had told me thirty years ago that we'd still be making the same
stupid mistakes in the
same ways, but this time in code more fundamental to the safe or secure
operation of everyday
engineered objects, I wouldn't have believed it.

Maybe it's a social thing. Mostly, people actually writing the code and
inspecting it are in their
twenties and their bosses maybe at most in their early thirties. The young
people have never made
*this* mistake before - the previous lot had of course, but they're all in
management now. I'm
reminded of Philip Larkin's ode to rediscovery, Annus Mirabilis:

Sexual intercourse began
In nineteen sixty-three
(Which was rather late for me)-
Between the end of the Chatterley ban
And the Beatles' first LP.

The Ensuing Discussion.

There was obviously discussion on the list of why we are making the same
old mistakes forty years
after it was known how to avoid them. Some discussants suggested it might
help to professionally
certify software engineers, a PE. Others referred to the Knight-Leveson
study a decade ago for the
ACM, in which inserting SE into the current PE scheme was not seen as
advantageous. UK discussants
pointed out that such certification exists in the UK, as a CEng through
the BCS or IET, and that
there had been some UK consideration of extra qualification for
critical-software engineering.

Such qualification for system safety hasn't (yet) generally caught on
anywhere. SARS offer it in
the UK for example. It didn't catch on in the US. Over a decade ago, the
System Safety Society
introduced an option for system safety engineering into the PE exam. They
had to pay the NPSE or
NCEES (I forget which) lots of money per year to maintain the option - and
two people took it in
some number of years. So they dropped it. (I was at the board meeting in
Ottawa in 2004 when this
was decided.)

The UK qualification regime hasn't stopped IT disasters in government
procurement. And it hasn't
stopped the kind of poor engineering which allows bank ATMs which use
supposedly
pseudo-one-time-pad nonce generation to be subject to replay attacks (see
a recent paper reciting
local experiments performed by Ross Anderson's group). I do note, however,
that the three examples
I mentioned above are all US examples. It's not ruled out that having some
degree of formal
professional training, as in the UK, encourages software engineers to
avoid repeating simple
mistakes whose prophylaxis has been well known for decades.

Time was, when UK and US cars were not known for their reliability. Kind
of like SW,
relatively-inexpensive cars used to go wrong a lot. However, some very
expensive cars such as made
by Rolls-Royce/Bentley and Wolseley were reliable. So there was proof of
concept. Japanese
companies decided it was possible to produce reliable
relatively-inexpensive cars and make money,
and did it.

There is proof of concept in SE, too. Unlike Rolls-Royce cars, it is not
prohibitively expensive.
Three out of my four examples involve run-time error. It is feasible to
produce SW
cost-effectively which is free from run-time error. Just like the Japanese
approach to cars, you
just have to decide to do it.

How about the following? We design a document called A Programmer's
Pledge. It has thirty or so
numbered clauses:

* I promise never to deliver SW which is subject to a data-range roll-over
phenomenon (especially
dates and times)

* I promise never to deliver software which is subject to a numerical
overflow or underflow exception

* I promise never to deliver software which reads data on which it raises
an "out of range" exception

* ..... and so on

A professional programmer signs it and files it with hisher professional
organisation. Quality
control issues in programs (such as the above phenomena) are routinely
subject to RCA of sorts.
When a programmer is responsible for a piece of code with such an error in
it, the company reports
it to the professional organisation and the programmer gets "points"
attached to the corresponding
clause in hisher Pledge. Like with driving (Germans say "points in
Flensburg" which is where the
office is. What is it in the UK? "Points in Cardiff"?). I bet lots of
organisations, from
companies hiring programmers to professional-insurance companies will find
uses for it.

PBL

Prof. Peter Bernard Ladkin, Faculty of Technology, University of
Bielefeld, 33594 Bielefeld, Germany
Je suis Charlie
Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de




-----BEGIN PGP SIGNATURE-----

iQEcBAEBCAAGBQJVRxS0AAoJEIZIHiXiz9k+Sv4H/3qSuiODGIZarIb0Rwj4PoOR
gi6zvdAb1ns2A8w0xXiBz6E8+iwik53ueVxhEDTINA4RXyoLTfFEVl9yunOR0qnU
7ht92kguaSjuM3BGUGYzy8MpZMjc0jyNWRmyC3wh0y3X0NnjL+/GMiqYR+3zq5RX
ZEzJk89SboZiB1kyTqMM+IcKzbABmk1CSaAkQziGvdJFWklNM10prMIk/5MprGwV
EeePB1rGs13Z1LZi8GIqdz8PDc1FKSz5qRugQ8VZJbbJvgct9JJVfEtQx3uElGkt
a/E5fQ/+Gw8CARMhpktEr/wLdk7t3akJvNF5iLK5W7Mbb3h0kd7sCNLZ5d9OZyA=
=i/nm
-----END PGP SIGNATURE-----
_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE



More information about the systemsafety mailing list