[SystemSafety] Fault, Failure and Reliability Again (short)

Tue Mar 3 22:50:12 CET 2015

Hi Peter

I have had some further thoughts wrt the reliability argument you present
in the blog and have done previously.  Your proposition is as follows, I
believe:

 "Software S exhibits reliability R when subject to input distribution D."

Software in this statement can be replaced with 'system' or 'structure' and
I would believe it would hold because the hardware defect is subject to an
input distribution and it may or may not fail.  However, there is a crucial
factor omitted from the above argument which does not hold for software:
time.

For example, a wing has a huge input range (in the continuous domain) and
any defect may not necessarily cause a failure.  It can, of course, fail if
subject to stress outside design range and this test is done my
manufacturers to ensure that the design is resistant to acceptable limits
to internal design weaknesses and material defects.  Meanwhile, back to
operations.  Over *time*, the same distribution may exacerbate the defect
to the point where a failure occurs.  The same input range therefore did
not *always* cause a failure, just after sufficient build up of stress (or
whatever) over *time* allowed the defect to become a fault.  We can measure
this and attribute a mean *time* between failures.  The other way of
thinking about this is that we know that the system (electronics as well as
structures), in the given environment will fail at some point.

For software, the time based element is irrelevant as, if the circumstances
required to hit the bug occur, it manifests itself as unexpected behaviour
at the system level and it will occur *every* time.  There is no wear out
mechanism for software (as noted by Michael’s earlier).  The distribution D
will always cause a system failure at the specific point of the defect in
the software; the software does not fail, the system does.  It therefore
makes no sense to talk about reliability of software because of the
irrelevance of the time based aspect.  The other way of thinking about this
is if the specific circumstances that would cause the defect do not occur,
then the software will *always* work as expected.

The riposte to this might be to argue that the reliability of software can
therefore be calculated as the probability of a set of circumstances in
distribution D that would cause the software defect to have a system
effect.  However, as you don’t know what the defect is (you would have
removed it, if you knew of it) , it’s effect nor the set of circumstances,
the value of this exercise is somewhat a guess and hence of dubious
measurable value.

It may also be possible to talk about the reliability of defect detection
techniques and hence make some claim about the subsequent defect freedom of
software.  For instance, there are bug hunting tools that claim to find
certain classes of bugs.  These tools rarely claim to have found all bugs
of a certain class, all of the time.  So one might be able to claim that
there are less bugs.  But all that has done has changed from one unknown
level of buginess to another (probably lower, but ….) level of buginess.
So, again, this is of dubious measurable value.

What can be said, I think, is that every defect removal technique has a set
of assumptions that have to be validated (by humans) and that therefore
there is a level of uncertainty.  The technique itself, review, analysis
(static or otherwise) or test has a limit and the boundaries of
acceptability are set in standards such as DO-178 where a level of activity
is agreed to be undertaken based on a set of Objectives that have to be met
in order to support a System Design Assurance Level (DAL).  In this way,
the reliability question is readily, acceptably and evidently addressed.

Regards

Nick Tudor
Tudor Associates Ltd
Mobile: +44(0)7412 074654
www.tudorassoc.com

*77 Barnards Green Road*
*Malvern*
*Worcestershire*
*WR14 3LR*
*Company No. 07642673*
*VAT No:116495996*

*www.aeronautique-associates.com <http://www.aeronautique-associates.com>*

On 3 March 2015 at 07:11, Peter Bernard Ladkin <ladkin at rvs.uni-bielefeld.de>
wrote:

> I had some private discussion with someone here who claims software cannot
> fail. I first heard this
> trope a quarter century ago, and I am informed indirectly by another
> colleague that it is still rife
> in certain critical-engineering areas. I address it this morning in a blog
> post at
>
> http://www.abnormaldistribution.org/2015/03/03/fault-failure-reliability-again/
>
> PBL
>
> Prof. Peter Bernard Ladkin, Faculty of Technology, University of
> Bielefeld, 33594 Bielefeld, Germany
> Je suis Charlie
> Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de
>
>
>
>
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20150303/520f5c37/attachment-0001.html>