[SystemSafety] RE : Qualifying SW as "proven in use"

Fri Jun 28 11:00:20 CEST 2013

Nancy,

remember that the whole story is initiated by process industries primarily, for IEC 61508 and IEC 61511. This is the cultural background, and it cannot be disregarded. It is NOT raised by software experts and it was NOT, until a very short time, handled by sofwtarea ns safety experts.

So the persons we have to convince:

*         Are not sensitive to a potential difference between prrof and certainty

*         "Never failed in past" and safe

I guess that, having participated to interesting audit panels in these industries in the past, you have a clear view of what the stakes are. Declare that whole plants are OK.

It is clear that if we push the standards toward the orientation and academic level of this forum we will gain in scientific rigour but they will fly miles over the readers and will be never applied.

The aim is not to find something totally consensual and as perfect as possible from and academic point of view but to drop in 2 standards (namely 61508 and 61511/S84 for USA) half a dozen of requirements that:

*         are intelligible to the readers

*         practically applicable and economically acceptable

*         as efficient as possible

Not and easy thing I ackowledge it...

Bertrand RICQUE
Program Manager, Optronics and Defense Division

T +33 (0)1 58 11 96 82
M +33 (0)6 87 47 84 64
23 avenue Carnot
91300 MASSY - FRANCE
http://www.sagem-ds.com<http://www.sagem-ds.com/>

[cid:image001.jpg at 01CE73ED.E5362F60]

From: systemsafety-bounces at techfak.uni-bielefeld.de [mailto:systemsafety-bounces at techfak.uni-bielefeld.de] On Behalf Of Nancy Leveson
Sent: Thursday, June 27, 2013 4:24 PM
To: systemsafety at techfak.uni-bielefeld.de
Subject: Re: [SystemSafety] RE : Qualifying SW as "proven in use"

1. Software is not, by itself, unsafe. It is an abstraction without any physical reality. It cannot itself cause physical damage. The safety of safety is dependent on
   -- the software logic itself,
   -- the behavior of the hardware on which the software executes,
   -- the state of the system that is being controlled or somehow affected by the outputs of the software (the encompassing "system"), and
   -- the state of the environment.
All of these things determine safety so a change in one can impact the so-called "software safety.:" For example, the change in the design of the Ariane 5 which led to a steeper trajectory than the Ariane 4 led to the software contributing to the explosion. The environment does matter. All the usage of that software on the Ariane 4 meant nothing with respect to its use in the Ariane 5.

Any change in the environment, in the controlled system, in the underlying hardware, or in the software invalidates all previous experience unless one can prove that the change will not lead to an accident (and that proof cannot not be based on a statistical argument). Does anyone know any non-trivial software, for example, that is not changed in any way over decades of use? or even years of use? And what about changes in the behavior of human operators, of the system itself, and of the environment?

Someone wrote:
> I've been thinking about Peter's example a good deal, the developer seems to me to have made an implicit assumption that one can use a statistical argument based on successful hours run to justify the safety of the software.
And Peter responded:
>>It is not an assumption. It is a well-rehearsed statistical argument with a few decades of universal acceptance, as well as various successful applications in the assessment of emergency systems in certain English nuclear power plants.

"Well-rehearsed statistical arguments with a few decades of universal acceptance" are not proof. They are only well-rehearsed arguments. Saying something multiple times is not a proof. The fact that nuclear power plants in Britain have not experienced any major accidents (they have had minor incidents by the way) rises only to the level of anecdote, and not proof. And that experience (and well-rehearsed arguments) cannot be carried over to other systems.

I agree with the original commenter about the implicit assumption, which the Ariane 5 case disproves (as well as dozens of others).

2. It is not even clear what "failure" of software means when software is merely "design abstracted from its physical realization." How can a "design" fail? It may not satisfy its requirements (when executed on some hardware), but design (equivalent to a blueprint for hardware) does not fail and certainly does not fail "randomly."

Perhaps the reason why software reliability modeling still has pretty poor performance after at least 40 years of very bright people trying to get it to work is that the assumptions underlying it are not true. These assumptions  have not been proven (only stated with great certainty) and, in fact there is evidence showing they are not necessarily true. I tried raising this point a long time ago, but I was met with such a ferocious response (as I am sure I will be here) that I simply ignored the whole field and worked on things that seemed to have more promise. The most common assumption is that the environment is stochastic and that the selection of inputs (from the entire space of inputs) that will trigger a software fault (design error) is random. There is data from NASA (using real aircraft) that is evidence of "error bursts" in the software (ref. Dave Eckhardt). It appeared that these resulted when the aircraft flew a trajectory that was near a "boundary point" in the software and thus set off all the common problems in software related to boundary points. The selection of inputs triggering the problems was not random.

As another example, Ravi Iyer looked at software failures of a widely used operating system in an interesting experiment where he found that a bunch of software errors appeared to be preceding a computer hardware failure. It made no sense that the software could be "causing" the hardware failure. Closer examination showed the problem. Hardware often degrades in its behavior before it actually stops. The strange hardware behavior, if I remember correctly, was exercising the software error handling routines until it got beyond the capability of the software to mitigate the problems. Again, in this case, the software was not "failing" due to randomly selected inputs from the external input space.

When someone wrote:
> I don't think that's true,
Peter Ladkin wrote:
>>You might like to take that up with, for example, the editorial board of IEEE TSE.

[As a past Editor-in-Chief of IEEE TSE, I can assure you that the entire editorial board does not read and vet the papers, in fact, I was lucky if one editor actually read the paper. Are you suggesting that anything that is published should automatically be accepted as truth? That nothing incorrect is ever published?]

Nancy

--
Prof. Nancy Leveson
Aeronautics and Astronautics and Engineering Systems
MIT, Room 33-334
77 Massachusetts Ave.
Cambridge, MA 02142

Telephone: 617-258-0505<tel:617-258-0505>
Email: leveson at mit.edu<mailto:leveson at mit.edu>
URL: http://sunnyday.mit.edu
#
" Ce courriel et les documents qui lui sont joints peuvent contenir des informations confidentielles ou ayant un caractère privé. S'ils ne vous sont pas destinés, nous vous signalons qu'il est strictement interdit de les divulguer, de les reproduire ou d'en utiliser de quelque manière que ce soit le contenu. Si ce message vous a été transmis par erreur, merci d'en informer l'expéditeur et de supprimer immédiatement de votre système informatique ce courriel ainsi que tous les documents qui y sont attachés."
******
" This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system."
#
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20130628/590a8b62/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1835 bytes
Desc: image001.jpg
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20130628/590a8b62/attachment-0001.jpg>