[SystemSafety] What do we know about software reliability?

Martyn Thomas martyn at 72f.org
Thu Sep 24 11:59:10 CEST 2020


On 24/09/2020 05:10, Brendan Mahony wrote:

> Why is it not the case that reliability estimates should be
> synthesised from the component reliabilities through mathematical
> analysis of the proposed design?

Back in the mid-late 1980s, I was chairman of Praxis, the software
engineering company that specialised in strong quality management and
formal methods. I was looking for markets that would need our particular
skills, so I attended one of the first conferences on safety-critical
systems. (It was in Guernsey). The audience was mainly electronic
hardware engineers; whenever the question of the reliability of software
came up, the audience laughed. "Why can't the software people give us
simple component reliabilities for our probabilistic reliability
analyses" was the common complaint. Someone said "surely it must be
possible to give a reliability for basic things, assignment statements
and so on, just as we can for resistors and capacitors".

We can do that, of course. The reliability of an assignment statement in
isolation is 1. But that's unhelpful because it matters what the
assignment statement does in context. The hardware designers failed to
see that the same is true for resistors and capacitors - if the circuit
design is wrong, perfect components don't make perfect circuits.

This misunderstanding persists. IEC 61508 ignore the probability that a
hardware design is incorrect (we used to talk about 'systematic errors'
to distinguish them from random failures', but I haven't seen that
expression used recently).

Statistical reliability analysis of systems is a necessary but imperfect
solution to the problem that we cannot analyse all the possible system
behaviours mathematically.

In my opinion, an engineer should design systems in a way that maximises
the extent to which system properties are amenable to reason and proof.
They should identify, document and justify the aspects that have to be
assured in other ways. Then they should use the best available methods
to provide as good assurance as they can.

Statistical testing is a powerful tool, especially when combined with
proof. See for example Littlewood, B. and Rushby, J. (2011). *Reasoning
about the Reliability of Diverse Two-Channel Systems in which One
Channel is "Possibly Perfect".*
<https://openaccess.city.ac.uk/id/eprint/1069/> /IEEE Transactions on
Software Engineering/, /38/(5), pp. 1178–1194. doi:10.1109/TSE.2011.80
<https://dx.doi.org/10.1109/TSE.2011.80>.  and Bishop, P.G., Bloomfiel,
R.E. and Cyra, L. (2013). *Combining Testing and Proof to Gain High
Assurance in Software: a Case Study.* /(ISSRE 2013) IEEE International
Symposium on Software Reliability Engineering/ 4-7 November, Pasadena,
CA, USA.

Martyn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20200924/f90ba425/attachment-0001.html>


More information about the systemsafety mailing list