[SystemSafety] Statistical Assessment of SW ......

Mon Jan 26 13:44:36 CET 2015

On 2015-01-26 12:19 , Les Chambers wrote:
> My understanding is that mathematical proofs for the reliability of an
> integrated hardware/software system .... [is] 30 years away. And in 30
> years you'll get the same answer.

There will never be any mathematical proofs of a probabilistic/statistical claim such as "this
element is reliable to <some measure of reliability>". There will, however, be assessments that such
a claim is true (or not) with a certain level of confidence which is not 100%.

And this under the assumptions that:
* the future distribution of inputs is identical to the statistics being offered as evidence; and
* there is perfect failure detection: when the element fails, this is guaranteed to be known and
recorded
* and no failures occurred (by the second assumption, recorded) during the runs adduced as
statistical evidence.

What level of confidence an assessor or user will require is something for negotiation. Similarly,
simple things can vitiate one or more of the assumptions. For example, suppose there is a clock
which is sampled (which happens often in control systems). Events have a unique timestamp. Prima
facie no statistics can be gathered which make a future distribution identical to the past, for at
at least the time will be different. So it would have to be argued (proven) that there is a
relative-time transformation which allows evidence from the past to be used to draw conclusions for
the future. As indeed there is in many cases, or should be.

In other words, lots of things get in the way, and will continue to get in the way. What will
develop is a plethora of techniques for handling common cases.

> I'm not against healthy debate and basic research in this area but, the
> thought that this subject is being discussed in standards committee meetings
> gives me the horrors. Tell me this is not true!

You're outing yourself, Mr. Rip van Winkle. It's been in IEC 61508-7 for eighteen years, since 1997,
as Annex D.

One of the main problems with the current writeup is that the assumptions are quite well disguised.
They must be brought out and displayed in all their finery.

Another is that the current writeup says it is only valid for "stateless" SW, which is not quite
true as written. It is valid for renewal processes (the things of which the Poisson distribution is
a description), and for example emergency shutdown-system SW generates a renewal process.

There are some people who argue that the word "stateless" can remain, for it has the political
advantage of rendering Annex D practically inapplicable.

I say: either we take the Annex completely out if it is meant to be inapplicable, or we say it like
it is. No sleight of hand to try to hinder incompetent engineers from misapplying it.

There is a small lobby for taking it out. It won't happen. For one thing, the new TS on "proven in
use" SW assessment refers to it. Ergo we say it like it is.

> One of the most useful and highly productive applications of these standards
> is attaching them to contracts. It is then incumbent on the supplier to
> comply and on the purchaser to validate compliance. Just the act of
> compliance has markedly improved the system development maturity of many
> organisations I have worked with. 

Yep.

> So, until someone has written The Dummies Guide To Proof Of Reliability Of
> Software And Electronic Systems - something that can be implemented by a
> third year engineering undergraduate, please restrain your youthful
> enthusiasm and think about the flow-on effects of what you are doing.

Suggestions I should restrain my youthful enthusiasm have proven themselves over the years to be
completely impractical.

The statistics is sophomore level. Applying them well takes judgement; just consider validating the
assumptions. No undergraduate engineer has had the opportunity to acquire such judgement.

> My understanding of 61508's take on reliability software that implements a
> safety function is: if you follow processes x, y and z we will allow you to
> deploy your software in a hardware environment that is rated at probability
> of failure on demand A. We will not allow you to boast that your software is
> that reliable, we will just allow you to deploy. 

Almost, yes. And it is incoherent. The "HW environment" of which you speak is an element that
includes both SW and HW. You can't show a reliability condition on the element in most cases unless
the SW satisfies some reliability condition.

> Believe me, just getting
> that message across to engineers, whose meaning of life does not emerge from
> probability and statistics, is a major ask.
> Question/ is this still the intent of the standard?

I can't speak for my colleagues on the MT. But I am not in favor of maintaining an incoherent position.

> So, I sincerely hope someone is working on Plan B for validation of the
> safety functions implemented with such systems.

I'm sure some US right-coast university can offer you Plans B through Z, which all work demonstrably
better than IEC 61508, which you may recall becomes "dangerous" west of 52°W.

PBL

Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Je suis Charlie
Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de