[SystemSafety] A small taste of what we're up against

Steve Tockey Steve.Tockey at construx.com
Mon Oct 29 19:41:57 CET 2018


If you limit your search to only actual incidents then you miss the ones where “somebody got lucky”? For example:

https://www.theregister.co.uk/2015/05/01/787_software_bug_can_shut_down_planes_generators/

They found the problem before it caused any damage, but was it a lucky find? If they didn’t find it, then what?

We should be including these “we’re lucky we found it before anyone died” incidents, too. And that would make the number larger.


— steve



From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de<mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de>> on behalf of Dewi Daniels <dewi.daniels at software-safety.com<mailto:dewi.daniels at software-safety.com>>
Date: Monday, October 29, 2018 at 9:53 AM
To: "ladkin at causalis.com<mailto:ladkin at causalis.com>" <ladkin at causalis.com<mailto:ladkin at causalis.com>>
Cc: The System Safety List <systemsafety at lists.techfak.uni-bielefeld.de<mailto:systemsafety at lists.techfak.uni-bielefeld.de>>
Subject: Re: [SystemSafety] A small taste of what we're up against

Agreed, there have been a number of incidents in which software has been implicated. I listed 15 of these in a paper I wrote in 2011 (https://ieeexplore.ieee.org/document/6136931), mostly based on your excellent compendium at http://www.rvs.uni-bielefeld.de/publications/compendium/incidents_and_accidents/index.html. My paper also listed the loss of the Airbus A330 flight test aircraft in 1994. Like you, I suspect there are others that are not in the public domain.

I think that avionics software is unusual in that most of the incidents have been due to requirements issues. The software implemented the requirements as written, but the requirements specified unsafe behaviour in some unforeseen circumstance. An example is the A320 accident in Warsaw in September 1993, where a software interlock that had been intended to prevent inadvertent deployment of reverse thrust and the spoilers in-flight delayed their deployment for 9 seconds after the aircraft touched down in a strong cross-wind.

In contrast, a much higher proportion of incidents with non-safety-related software have been due to coding errors. Two examples of such coding errors are:
1. The various buffer overflow vulnerabilities exploited in Internet Explorer
2. The Apple SSL bug, where an extraneous goto meant that the software did not check whether a certificate was valid (https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/)

John Rusby wrote an excellent paper on why he thinks that DO-178B has worked so well (http://www.csl.sri.com/users/rushby/papers/emsoft11.pdf).

The in-flight upsets of a Boeing 777 and an Airbus A330 are very interesting..

The in-flight upset of a Boeing 777 occurred on 1 August 2005 north-west of Perth, Australia. The report can be downloaded from https://www.atsb.gov.au/media/24550/aair200503722_001.pdf. The problem turned out to be in the Air Data Inertial Reference Unit (ADIRU) software. An accelerometer had failed in 2001. The fault was not recorded in Non-Volatile RAM, so the accelerometer was not replaced. In 2005, another accelerometer failed, so the ADIRU used the previously failed accelerometer instead, resulting in erroneous output which cause the aircraft to pitch up. I don't know what programming language was used for the ADIRU software. The report states that the ADIRU software was developed to DO-178A, which predates DO-178B.

The in-flight upset of an Airbus A300 occured on 7 October 2008, west of Learmouth, Australia,. The report can be downloaded from https://www.atsb.gov.au/media/3532398/ao2008070.pdf.  There were multiple spikes in the Angle Of Attack (AOA) output of an ADIRU (from a different manufacturer). The investigators were unable to determine the reason for these spikes (or even whether they were due to a hardware or a software fault). These spikes kave only been observed three times in 128 million hours of operation. A flaw in the design of an algorithm in the Flight Control Primary Computer (FCPC) meant it was unable to cope with these spikes, so it commanded the aircraft to pitch down. Airbus redesigned the AOA algorithm to prevent the same type of accident from occurring again. Again, I don't know what programming language was used for the ADIRU or FCPC software. Again, the report states that the ADIRU and FCPC software was developed to DO-178A. Also, the report states that the FCPC requirements were written in a formal specification language called SAO.

Yours,

Dewi Daniels | Director | Software Safety Limited

Telephone +44 7968 837742 | Email d<mailto:ddaniels at verocel.com>ewi.daniels at software-safety.com<mailto:ewi.daniels at software-safety.com>

Software Safety Limited is a company registered in England and Wales. Company number: 9390590. Registered office: Fairfield, 30F Bratton Road, West Ashton, Trowbridge, United Kingdom BA14 6AZ


On Fri, 26 Oct 2018 at 09:51, Peter Bernard Ladkin <ladkin at causalis.com<mailto:ladkin at causalis.com>> wrote:


On 2018-10-26 10:14 , Dewi Daniels wrote:
> On Wed, 24 Oct 2018 at 10:26, Martyn Thomas <martyn at thomas-associates.co.uk<mailto:martyn at thomas-associates.co.uk>
> <mailto:martyn at thomas-associates.co.uk<mailto:martyn at thomas-associates.co.uk>>> wrote:
>
>     I'd like to see an ALARP argument for software written in C. Does anyone
>     have one to share?
>
> There are over 25,000 certified jet airliners in service world-wide, many containing software
> written in C. There has not been a single hull-loss accident in passenger service ascribed to a
> software fault.
True as far as public information goes, but there well could have been. QF72 in October 2008, and
the other incident in December 2008. Also the upset to Boeing 777 9M-MRG in August 2005.

Concerning SW involvement: a test A330 was lost in June 1994 in part because of a loss of critical
flight information at high angles of attack in the then-design. There is arguably software
involvement in other fatal accidents.

It also depends on what you consider to be a "software fault". When software behaves according to a
requirements specification which leaves a behaviour open which leads to an accident, then some
people would call it a software fault (because the software behaved in an unwanted manner, causing
the accident) and others would say there was no software fault (because the SW behaved according to
the requirements specification).

PBL

Prof. Peter Bernard Ladkin, Bielefeld, Germany
MoreInCommon
Je suis Charlie
Tel+msg +49 (0)521 880 7319  www.rvs-bi.de<http://www.rvs-bi.de>





_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE<mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20181029/8941ddf7/attachment-0001.html>


More information about the systemsafety mailing list