[SystemSafety] Software reliability (or whatever you would prefer to call it)

Tue Mar 10 09:12:00 CET 2015

On 2015-03-09 18:48 , Steve Tockey wrote:
> *) Software itself is inherently deterministic.

This is a point repeatedly emphasised by Nick Tudor. There is a simpler way to say it: SW behaves in
this case as a mathematical function.

> *) The underlying hardware should also be—for the most part—deterministic.

Dear me. We just heard yesterday Kevin Driscoll's talk "Murphy was an Optimist". I first heard it in
2010 at SAFECOMP in Vienna. Kevin now has a nine-hour tutorial on it. It's about all the ways in
which systems could go wrong in ways not anticipated by the designers.

Byzantine failures are just the start. In the 1980's, when they first appeared in the computer
science literature, they were judged to be theoretical constructs of no practical interest. In the
later 1990's, people were beginning to have second thoughts, and in SAFECOMP 2003 Kevin described
how a major commercial aircraft type almost lost its airworthiness certificate because of a burst of
Byzantine failures.

Kevin makes the point that (if I may phrase it so) "digital is just an approximation of analogue
processes".

So I think whether one accepts your statement depends upon what one takes to be covered by the
phrase "for the most part".

> So the issue really isn't "reliability of the software" per se, 

One issue is whether (a) the notion makes sense. Another issue is, (b) if it makes sense, whether
some of its characteristics are isomorphic to those of Bernoulli's Urn Model. The answer to both
those questions is yes.

Indeed, as I noted yesterday, there is a well-defined notion of function reliability (but I didn't
call it that), in which you compare a candidate function with a norm function. There is no mention
of SW in the definition of function reliability. But if you take SW to be rigorously deterministic,
the definition of function reliability applies to that software.

Suppose your SW is indeed deterministic. Bertrand pointed out yesterday that that the phenomenon
captured in function reliability is only one of three ways in which the output you get from your SW
is or is not what you had hoped for. Depending on your depth of analysis, it may also include
specification reliability (how often the software gives wrong results when its result conforms with
its specification, or when the specification does not determine what the result should be) and mixed
reliability (in which a HW anomaly may also be causally involved).

And, of course, your SW may in fact not be deterministic. The non-determinism might well contribute
to the success or failure of your SW on particular inputs.

> the issue is "reliability of the software under input space S". 

I think you are missing a crucial distinction. There are two parameters, as I noted in previous. The
domain of input is important, but so is - crucially - the distribution of those inputs in your sample.

> If one had the ability to characterize input space S for some software, and if someone were able to
> comprehensively cover all possible inputs (in all possible sequences…) then one could truly measure
> the "reliability" of the code with respect to S—"What percent of the members of input space S lead
> to failures?", or, "On average, what time period passes between members of input space S that lead
> to failures?"

Not as I interpret your one parameter "input space". How often the software succeeds or fails is
dependent crucially on the input sample. As I have pointed out, if there is at least one input on
which the software succeeds and one input on which the software fails, you can make the function
reliability anything between 0% and 100% by varying the input distribution.

> The problem is that the input space S for any non-trivial program is massive, so ...

I think for this entire paragraph to make much sense as an argument, you first need to distinguish
between input domain and the distribution of inputs to the SW.

> a) some are arguing that software is deterministic so reliability is moot. True to an extent, but it
> depends on the chosen input space S

No. Not true to any extent. The notion of function reliability is well defined.

> b) some are arguing that if input space S' is a fair and accurate representation of input space S
> then conclusions about reliability of software under S can be inferred from reliability observed
> under S'

To judge this, I think I would need to know what you are meaning by "input space".

> c) some are arguing that there's no way (today) to ever know if input space S' is a fair and
> accurate representation of input space S, so any conclusions about reliability of software under S'
> aren't necessarily transferable to the same software under input space S

I think that it would be wrong to suggest that there aren't ways of knowing.

PBL

Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Je suis Charlie
Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de