[SystemSafety] Statistical Assessment of SW ......

DREW Rae d.rae at griffith.edu.au
Mon Jan 26 12:41:25 CET 2015


A simple thought experiment. Let's say someone claims to have a suitable
method of predicting combined hardware/software reliability.
On what basis could they ever support that claim? I would argue that such a
claim about a method intended for real-world use is empirical in nature,
and can only be validated empirically. Unfortunately this requires an
independent mechanism for counting the failures, and that there be enough
failures to perform a statistical comparison of the prediction with
reality. The independent mechanism will need to be suitable for and fitted
to every system that a prediction is made for, to avoid sampling bias, or
we need sufficient understanding of the causes of failures to be able to
create a representative sample.

I would further argue that if we had a general-purpose hardware/software
error detecting machine, we wouldn't be using it to passively count errors
in deployed systems.

Conclusion: No method for predicting hardware/software reliability can
actually be shown to accurately predict hardware/software reliability. All
claims about hardware/software reliability are constructed using methods
that themselves haven't been adequately validated. At the very least, any
predictions have error bars so wide that they shouldn't be used to
distinguish between safe and unsafe.

Caveat: As Les made clear, it's a very different kettle of fish to say
"I'll accept your system as adequate if you follow these methods" than to
say "I'll accept your claim about a particular reliability based on
following these methods". So long as the methods don't include "statistical
analysis to show that a particular reliability target is met" they are
totally different risk acceptance criteria.


My safety podcast: disastercast.co.uk
My mobile (from October 6th): 0450 161 361

On 26 January 2015 at 11:19, Les Chambers <les at chambers.com.au> wrote:

> Guys
> Re:  >> This argument came up again yesterday in a standards-committee
> meeting.
>
> My understanding is that mathematical proofs for the reliability of an
> integrated hardware/software system have a lot in common with nuclear
> fusion. Ask any physicist and they'll tell you it's 30 years away. And in
> 30
> years you'll get the same answer.
> I'm not against healthy debate and basic research in this area but, the
> thought that this subject is being discussed in standards committee
> meetings
> gives me the horrors. Tell me this is not true!
>
> One of the most useful and highly productive applications of these
> standards
> is attaching them to contracts. It is then incumbent on the supplier to
> comply and on the purchaser to validate compliance. Just the act of
> compliance has markedly improved the system development maturity of many
> organisations I have worked with. The downside of all this is the tendency
> of some standards bodies to throw in a normative reference to a process or
> method someone read about in a textbook, that is either totally impractical
> or so expensive to implement that supplier and purchaser can spend years
> negotiating them out of the contract. And these arguments can turn ugly. I
> once witnessed one contractor waiting till close to the delivery date,
> admitting to total non-compliance and bullying the prime contractor into
> ditching the compliance requirement. This was a no-brainer for the prime.
> Arguing the point would have held up the commissioning of a 13,000,000,000
> dollar project.
>
> So, until someone has written The Dummies Guide To Proof Of Reliability Of
> Software And Electronic Systems - something that can be implemented by a
> third year engineering undergraduate, please restrain your youthful
> enthusiasm and think about the flow-on effects of what you are doing.
>
> Just checking:
> My understanding of 61508's take on reliability software that implements a
> safety function is: if you follow processes x, y and z we will allow you to
> deploy your software in a hardware environment that is rated at probability
> of failure on demand A. We will not allow you to boast that your software
> is
> that reliable, we will just allow you to deploy. Believe me, just getting
> that message across to engineers, whose meaning of life does not emerge
> from
> probability and statistics, is a major ask.
> Question/ is this still the intent of the standard?
>
> One could argue that this approach is also spurious. As others have pointed
> out, the hardware environments into which we deploy our software these days
> are often so complex you are hard pressed to calculate hardware
> reliability.
> Then if you consider systems thinking, the emergent properties of an
> integrated hardware and software system are likely to throw up failure
> modes
> that were not considered in either the hardware or the software designs.
>
> So, I sincerely hope someone is working on Plan B for validation of the
> safety functions implemented with such systems.
>
> Les
>
>
> -----Original Message-----
> From: systemsafety-bounces at lists.techfak.uni-bielefeld.de
> [mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de] On Behalf Of
> Peter Bernard Ladkin
> Sent: Friday, January 23, 2015 4:43 PM
> To: systemsafety at lists.techfak.uni-bielefeld.de
> Subject: Re: [SystemSafety] Statistical Assessment of SW ......
>
> On 2015-01-21 14:15 , jean-louis Boulanger wrote:
> > For software it's not possible to have statistical evidence.
> > the failure is 1 (yes the software have fault and failure appear)
>
> This argument came up again yesterday in a standards-committee meeting. It
> is usually attributed to
> third party "engineers with whom I work", because nobody quite seems to
> claim they hold the view
> themselves when I'm in the room :-) ....
>
> So it might be worthwhile to adduce the proof - again. It's real short.
>
> Suppose you have a piece of SW S which is deterministic. And S is also not
> perfect, so it outputs
> right answers on some inputs and wrong answers on others. And S reverts to
> an initial state with no
> memory of its previous behavior each time it produces its output.
>
> Suppose the distribution of inputs to S has a stochastic character. That
> is,
> the input I is a random
> variable. Then the output outS(I), which is a function of the input I, also
> has stochastic
> character. A deterministic transformation of a random variable is itself a
> random variable.
>
> Let us transform outS(I) further, deterministically. Define
> CorrS(I) = 1 if outS(I) is correct
> CorrS(I) = 0 if outS(I) is incorrect
>
> Then again CorrS(I) has also a stochastic nature and is a random variable.
>
> Thus, if the input to a piece of SW has stochastic nature, then so does the
> correctness behavior of
> the SW.
>
> QED.
>
> The only reasonable objection to this argument which I have heard is to
> dispute whether inputs have
> a stochastic nature.
>
> So, say you build a railway locomotive control system. The piece of track
> the locomotive runs on has
> a fixed architecture, so the argument would run that the behavior of the
> locomotive is more or less
> determined within certain parameters (whether signal X is red or green) and
> does not have a
> stochastic nature. But various parameters such as the condition of the
> track, the nature of the load
> on the locomotive, and other environmental conditions such as wind speed
> and
> weather (icy track, or
> dry track, and when icy where the ice is) make it practically all but
> impossible to predict the
> inputs to the control system. Besides, at design time the design does not
> involve designing to the
> specific route the locomotive will run on. The designer is ignorant of the
> application. So the
> inputs to the control system as known at design time have a stochastic
> nature if you are a Bayesian.
>
> I would like to remark here, again, on a couple of incoherences in IEC
> 61508
> and "derivative"
> standards.
>
> Something which executes a safety function must consist of both HW and SW,
> because SW alone cannot
> take action. A HW-SW element which executes a safety function is assigned a
> reliability goal, which
> is mostly encapsulated in the SIL. These reliability goals are the safety
> requirements. A
> reliability goal is expressed in terms of probability of function failure
> per demand, or per unit
> time. Suppose that the correct functioning of the HW-SW element E is
> functionally dependent on the
> correct functioning of its SW S (which for most actuators it is). The
> standard requires one
> demonstrates that the reliability is attained (that the safety requirement
> is fulfilled).
>
> How this is actually done must be something like the following.
>
> We assume as above that the element E deterministically transforms its
> inputs. We define the
> function CorrE as above. Given a distribution of inputs Distr(I), then the
> probability that E
> functions correctly is given by
> (Integral over Distr(I) of the function CorrE(I)) divided by (Integral over
> Distr(I) of the constant
> 1).
>
> Notice that the probability of correct functioning, the safety requirement
> as laid down by IEC
> 61508, is dependent on Distr(I). Change Distr(I) and one can usually expect
> the probability to
> change. (For example, let Distr(I) be the Dirac Delta function on one
> incorrect input. Then the
> probability that E functions correctly is 0.)
>
> Yet in IEC 61508, and everywhere else, Distr(I) is not mentioned. Not once.
>
> This is incoherent.
>
> One could fix it, maybe, by just assuming the uniform distribution on all
> inputs, by default. Or the
> normal distribution. There may be reasons for this, but it is worth
> pointing
> out that Distr(I) in
> real applications is almost never uniform or normal. If there is a
> distribution D for which it can
> be argued that the real-world input distribution "almost always
> approximates
> D" then one could
> choose D as the default instead.
>
> The second incoherence is as follows. If the SW does not attain the safety
> requirement, then E does
> not attain the safety requirement, under a certain plausible assumption,
> namely that if CorrS(I) =
> 0, then CorrE(I) is almost always 0. (That is, the HW may sometimes
> fortuitously compensate for
> incorrect SW behavior, but mostly not.) Then in order for E to fulfil the
> safety requirement, it
> must be the case that
>
> (Integral over Distr(I) of the function CorrS(I)) divided by (Integral over
> Distr(I) of the constant
> 1) GEQ (Integral over Distr(I) of the function CorrE(I)) divided by
> (Integral over Distr(I) of the
> constant 1)- epsilon
>
> (epsilon is there to instantiate the "almost" part of the assumption).
>
> So, since the safety requirement on E has a probabilistic calculation as a
> component, so must the
> inherited safety requirement on S.
>
> Yet there is no requirement in IEC 61508 to substantiate that inherited
> safety requirement on S. The
> only condition on software safety requirements is the techniques which are
> recommended to be used
> during development of S.
>
> In particular, if you don't think that the execution of SW can have a
> stochastic nature, such as
> Jean-Louis, you are thereby committed to the view that IEC 61508 and its
> derivates are inherently
> incoherent. It must be a difficult world to live in ......
>
> PBL
>
>
> Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld,
> 33594 Bielefeld, Germany
> Je suis Charlie
> Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de
>
>
>
>
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
>
>
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20150126/7b2478aa/attachment-0001.html>


More information about the systemsafety mailing list