[SystemSafety] What do we know about software reliability?

Peter Bernard Ladkin ladkin at causalis.com
Wed Sep 16 09:36:51 CEST 2020



On 2020-09-16 02:00 , hugues.bonnin at free.fr wrote:
> 
> I hope that this reasoning helps...

Unfortunately not really. It is essentially a piece of legal-looking opinion but you don't indicate
it is grounded in any legal reality (you don't actually cite any European law, for example). I have
colleagues/acquaintances who do this kind of thing professionally, who try to establish the legal
context in which the companies they work for, who produce and sell safety-related kit, are doing
business.

The point of my question is that Bob Schaefer's flippant response is inadequate as an engineering
response. The company I am talking about is an international supplier of highly-reliable simple
subsystems (smart sensors), mostly to the process industries, and has a high reputation. I mentioned
their engineering question to those who had not yet encountered it in my papers of four years ago,
which itself is 3 years after I first heard the question posed, in 2013.

The answer to the question is to try to use the operational history to derive an assessment of the
chances that the device will not fail in so-and-so-many hours of further use in a sufficiently
similar environment, as well as a confidence in that assessment. It should not surprise anyone that
there are methods to do that. Neither should it surprise anyone that the question is not yet
answered in a completely satisfactory manner.

The company's kit commands a price established inter alia through its reputation. They have clients,
lots and lots of them, who use the kit because it works continuously without breaking. Note that
word "because". The clients have reasons why they use the kit. Are those reasons good, or poor? What
are the criteria for assessing such reasons as good, or poor?

The question cannot be answered at all by those who think statistical assessment of software cannot
be performed. One can further surmise that such naysayers have never had to try to answer such a
question professionally. Me, I've heard it once every couple of months for 7 years at least. I was
first asked it in October 2009, which is what got me into software reliability assessment.

I notice that all the people claiming that "software reliability" is some kind of illegitimate
concept are all people working in or around airside aerospace. So let's leave NPPs and the process
industries aside and bring up some aerospace themes.

Recall that, if one is building DAL A SW for a piece of civil avionics kit for high-performance jet
transports, the requirement is that that kit shall not fail with potentially catastrophic outcome
more than once in one billion hours of operation. Someone has to produce that argument, and the
regulator must have high confidence in the conclusion. So how do you distinguish legitimate
arguments from misleading ones?

The way it is done is by faking it. You develop SW to DO178 (using the FM and OO supplements if you
like) and the regulator is satisfied you did so. Then that is taken as an acceptable means of
compliance with the regulation. Even though there is no established intellectual connection between
the qualitative measures and the numerical requirement. The jump is intellectually vacuous. (But it
may still be right; as I recall, Peter B thinks the numbers on some airframes do tend to show in
retrospect that the really sensitive kit has satisfied at least a 10^(-8) level of reliability.)

That is also more or less the way in which it is done in 61508. Random failures are given specific
numerical targets. Systematic failures are handled not through any objective measures such as
numerical targets, but through conforming with certain procedures and assessments which are said to
confer a "systematic capability" on the device (whether HW or SW) which is in turn taken to confer a
certain freedom from systematic dangerous failure.

There is no proof that it does, of course - the jump is as intellectually vacuous as it is in
aerospace. But the standard is thought to represent a collection of experience of the sort that, if
you do this-and-that, then you reduce the incidence of thus-and-thus problems, and this is codified
into something called a "systematic capability".

But how do you establish a criterion that, if you do this-and-that, then you reduce the incidence of
thus-and-thus problems? Theoretically, it could be done, but generally is not, by analysing a
collection of data representing operational histories of something. It is done mainly through
someone on the authoring committee being intuitively convinced of it and persistent enough to get it
written in to the document. Politics and company's business models, in other words.

BTW, the suggestion that you should be assessing SW+HW, because SW cannot fail unless it is running,
and SW can't be running unless it is running on HW, is, first, a helpful rule of thumb for industry
(because that is how the units are seen by the regulators); second, a non sequitur (the conclusion
does not follow from the premises); third, logically pointless.

Apropos logically pointless, here is how to identify SW failures given a history of total-kit
failures. You have a piece of kit, with a failure history H. And let's say you believe in the
classification "random HW failure", "systematic HW failure", "HW-SW interaction failure" and "other
failure". Given H, take out the random and systematic HW failures, and (if you don't want to count
them) the HW-SW interaction failures. You are left with the "other failures". Those are the SW
failures. It is a subclass of all the failures you've seen. If you have all sorts of different HW on
which the SW runs, you can pool the histories of "SW failure" on each of the different devices and
ask what those numbers tell you.

I might also point people again to the discussions about the UK Post Office Horizon system. Much of
the legal argument in court was an attempt to establish or refute a claim that the Horizon system
was "robust". It is not ultimately clear what "robust" was supposed to mean (although certainly not
the ISO/IEC/IEEE notion of "robust") but sometimes it seems to have been conflated with "reliable"
(in the sense of the IEC definitions I mentioned earlier). So here are people being sent to jail on
the basis of assessments (mostly, I would argue, poor) of a software-based system's reliability. It
seems fatuous as well as supercilious for some here to argue that the concept is not legit.

PBL

Prof. Peter Bernard Ladkin, Bielefeld, Germany
Styelfy Bleibgsnd
Tel+msg +49 (0)521 880 7319  www.rvs-bi.de





-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20200916/87f4c5c0/attachment.sig>


More information about the systemsafety mailing list