[SystemSafety] What do we know about software reliability?

Coq, Thierry Thierry.Coq at dnvgl.com
Tue Sep 15 11:21:31 CEST 2020


Hi all,
The statistical evidence at the system level is valid, for the random hardware failures. There are systematic hardware failures and systematic software failures that cannot be caught by the random approach. Ariane A501 flight has disproved any probabilistic approach on software that would not start with a frequency of failure of 1. Also disproving the rather quaint notion of “reuse of proven equipment” without taking into account the operational profile.
Software is not the only part of the system with systematic failures, hardware has it too (see valve failure on the british nuclear submarines). However Software is the only part of a system that doesn’t have random failures, those come all from the hardware.
Best regards,
Thierry Coq
The opinions expressed here are my own.
From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de> On Behalf Of Nick Tudor
Sent: mardi 15 septembre 2020 11:01
To: Peter Bishop <pgb at adelard.com>
Cc: The System Safety List <systemsafety at lists.techfak.uni-bielefeld.de>
Subject: Re: [SystemSafety] What do we know about software reliability?

Peter wrote:

"And statistical testing is used in the UK nuclear industry fore safety critical systems,"

This is true, but note that the statement is safety critical 'systems'; not 'software'.  While the UK standards do talk about 'on demand' and some numbers, it is actually focussed at the system level, not the software.  There is (at last) a distinct but still tacit acceptance that it is a nonsense to accept software has a 'reliability' and hence to be treated in the same way that hardware fails.  It is much better to give a reason why the software should be trusted rather than rely (!) on some outdated research for which the foundations are highly questionable.  This is now gaining acceptance rather than a statistical approach.

As I recall, I have said before on this list, software has no wear out mechanism so software reliability is somewhat meaningless.  I was widely abused (some even said bullied) for suggesting that software reliability was not the right way of thinking about software assurance.  It is therefore with some trepidation that I dive into this thread.



Nick Tudor
Tudor Associates Ltd
Mobile: +44(0)7412 074654
www.tudorassoc.com<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tudorassoc.com%2F&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581380089&sdata=rr8ZhN1xyoXnhFnpwDuALhCUbE5h%2FW4VKLUSvPMdv0E%3D&reserved=0>
[http://www.tudorassoc.com/wpimages/wpb4e71a5c_0f.jpg]

77 Barnards Green Road
Malvern
Worcestershire
WR14 3LR
Company No. 07642673
VAT No:116495996

www.aeronautique-associates.com<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.aeronautique-associates.com%2F&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581380089&sdata=o1LqRtL6KbozBzhTQkiGj011G%2BzNP6AVeLLex3Vfhg0%3D&reserved=0>


On Tue, 15 Sep 2020 at 09:46, Peter Bishop <pgb at adelard.com<mailto:pgb at adelard.com>> wrote:
On 14/09/2020 15:04, Martyn Thomas wrote:

Why are you completely dismissing software reliablity?

Is it not the case that if you can tolerate a failure rate of once in 1000 hours, 99% confidence through testing would take about 200 days to demonstrate (so long as the test environment is "sufficiently" like the future operating environment and you are able to detaect every failure correctly)?

And statistical testing is used in the UK nuclear industry fore safety critical systems, so it is not just abstract theory,

Re your characterisation of confidence based statistical testing on P153 (with no reference), I do not think it is fair to dismiss this because "p can vary by orders of magnitude". Testing presumes a fixed operational profile and a constant probability of failure.

There has also been some work on the impact of profile change on the bound that can be claimed.

https://www.researchgate.net/publication/307555914_Deriving_a_frequentist_conservative_confidence_bound_for_probability_of_failure_per_demand_for_systems_with_different_operational_and_test_profiles<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.researchgate.net%2Fpublication%2F307555914_Deriving_a_frequentist_conservative_confidence_bound_for_probability_of_failure_per_demand_for_systems_with_different_operational_and_test_profiles&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581390077&sdata=4gqsrw75JYaLBmJJjIgOFSCfICnXQHzyZgoMqEhoPus%3D&reserved=0>

BTW, re, your summary of my paper on the same page, I think you missed the main point. This is a predictive theory to derive a worst case bound for some time in the future, i.e.

Given N faults what is the worst possible reliability  at some future time T?
- it assumes fault fixing  will occur during that time.

You also only presented the theory of N=1, and you seem to assume the T has already happened with zero failures (not a requirement for this model)

Might have been better to reference the original worst case bound version (which makes it clear that it is a long term forward prediction)

https://www.researchgate.net/publication/3152200_A_conservative_theory_for_long-term_reliability-growth_prediction<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.researchgate.net%2Fpublication%2F3152200_A_conservative_theory_for_long-term_reliability-growth_prediction&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581390077&sdata=gFp342%2B%2F%2Fj2Cu%2BgMPjXN7CNG9J8%2F6B6ZtIPNi7W8zhs%3D&reserved=0>

Of course, the testing would have to be repeated following a change to the software, unless you have enough formality to show that the change cannot affect reliability.

In specific circumstances, you can do better than this. Bev Littlewood's published papers provide strong evidence and a rich bibliography. Bev's paper on "How reliable is a program that has never failed?" offers a useful rule-of-thumb: that aften n hours of fault free operation, there is about 50% chance of a failure in the following n hours (subject to some obvious constraints).

The difficulties rapidly escalate when you need 10^-4 or better at >90% confidence.

Martyn
On 14/09/2020 14:14, SPRIGGS, John J wrote:
In my experience, if Software Reliability is mentioned at a conference, at least one member of the audience will laugh, and if it is mentioned in a work discussion, at least one member of the group will get angry.
Interestingly, some of the same people who say it is impossible to quantify software failure rates will set numerical requirements for Software Availability – if you get one of those, ask the Customer how (s)he wants you to demonstrate satisfaction of the requirement.

John
From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de><mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de> On Behalf Of Derek M Jones
Sent: 14 September 2020 12:54
To: systemsafety at lists.techfak.uni-bielefeld.de<mailto:systemsafety at lists.techfak.uni-bielefeld.de>
Subject: [SystemSafety] What do we know about software reliability?

All,

What do we know about software reliability?

The answer appears to be, not a lot:
http://shape-of-code.coding-guidelines.com/2020/09/13/learning-useful-stuff-from-the-reliability-chapter-of-my-book/<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2020%2F09%2F13%2Flearning-useful-stuff-from-the-reliability-chapter-of-my-book&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581400070&sdata=q7w%2FvAsVc2u7T9rRiPSEyMBHXv1%2B38zZYZDX5gz7ObI%3D&reserved=0>

--
Derek M. Jones Evidence-based software engineering
tel: +44 (0)1252 520667 blog:shape-of-code.coding-guidelines.com<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581400070&sdata=UBz%2B0QFeCbOlawmTb5tjywh235pV7mmuf2FLj%2B6r6xc%3D&reserved=0>
_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE<mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.techfak.uni-bielefeld.de%2Fmailman%2Flistinfo%2Fsystemsafety&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581410066&sdata=Jh1OFic%2B3NzUN68AVH71VmkJMbYdg7k3TY12hiVunQo%3D&reserved=0>

________________________________
If you are not the intended recipient, please notify our Help Desk at Email Information.Solutions at nats.co.uk<mailto:Information.Solutions at nats.co.uk> immediately. You should not copy or use this email or attachment(s) for any purpose nor disclose their contents to any other person.

NATS computer systems may be monitored and communications carried on them recorded, to secure the effective operation of the system.

Please note that neither NATS nor the sender accepts any responsibility for viruses or any losses caused as a result of viruses and it is your responsibility to scan or otherwise check this email and any attachments.

NATS means NATS (En Route) plc (company number: 4129273), NATS (Services) Ltd (company number 4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd (company number 3155567) or NATS Holdings Ltd (company number 4138218). All companies are registered in England and their registered office is at 4000 Parkway, Whiteley, Fareham, Hampshire, PO15 7FL.
________________________________


_______________________________________________

The System Safety Mailing List

systemsafety at TechFak.Uni-Bielefeld.DE<mailto:systemsafety at TechFak.Uni-Bielefeld.DE>

Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.techfak.uni-bielefeld.de%2Fmailman%2Flistinfo%2Fsystemsafety&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581410066&sdata=Jh1OFic%2B3NzUN68AVH71VmkJMbYdg7k3TY12hiVunQo%3D&reserved=0>


_______________________________________________

The System Safety Mailing List

systemsafety at TechFak.Uni-Bielefeld.DE<mailto:systemsafety at TechFak.Uni-Bielefeld.DE>

Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.techfak.uni-bielefeld.de%2Fmailman%2Flistinfo%2Fsystemsafety&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581410066&sdata=Jh1OFic%2B3NzUN68AVH71VmkJMbYdg7k3TY12hiVunQo%3D&reserved=0>

--



Peter Bishop

Chief Scientist

Adelard LLP

24 Waterside, 44-48 Wharf Road, London N1 7UX



Email: pgb at adelard.com<mailto:pgb at adelard.com>

Tel:  +44-(0)20-7832 5850



Registered office: 5th Floor, Ashford Commercial Quarter, 1 Dover Place, Ashford, Kent TN23 1FB

Registered in England & Wales no. OC 304551. VAT no. 454 489808



This e-mail, and any attachments, is confidential and for the use of

the addressee only. If you are not the intended recipient, please

telephone 020 7832 5850. We do not accept legal responsibility for

this e-mail or any viruses.
_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE<mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.techfak.uni-bielefeld.de%2Fmailman%2Flistinfo%2Fsystemsafety&data=02%7C01%7Cthierry.coq%40dnvgl.com%7C1fbb8cb48c2f497ae54f08d85955db17%7Cadf10e2bb6e941d6be2fc12bb566019c%7C1%7C0%7C637357572581420056&sdata=ifK%2BVel%2FZ85LsiOPHLiwqrwTFh0wd105R9o0P8vtVDM%3D&reserved=0>

**************************************************************************************
This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited.
**************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20200915/b526821c/attachment-0001.html>


More information about the systemsafety mailing list