[SystemSafety] What do we know about software reliability?

Tue Sep 15 15:30:20 CEST 2020

Derek Jones wrote:

> Some term has to be used.  An existing word comes with lots of baggage,
but
> I don't see any overall benefit in trying to define a new word.

It is not possible for me to disagree with this sentiment more than I do.
There is *always* a benefit in replacing a bad word choice with a better
one. Trying to mitigate a bad choice by providing a precise definition has
been shown over and over again in many different disciplines to rarely
work.  Most people when they see a word whose definition they believe they
know will use the definition they know no matter how clearly or how often a
different definition is given*. This phenomenon seems to be more often true
than not even among people who are aware of it, and try really hard to
overcome it.

In my opinion, "software reliability" is the second worst (that is,
misleading, unhelpful, confusing) phrase ever coined within the software
community**.  To far too many people (myself included) "reliability"
necessarily includes notions of either randomness (for example, given an
identical environment, history, design, and manufacturer, component A fails
but B does not) or degradation over time.  Because neither notion applies
to conventional software, the phrase "software reliability" is (and always
will be) to me at best meaningless and at worst misleading.

With that said, I nevertheless have quite a bit of respect for some of the
folks who conduct research in the area, particularly those in the UK who
have been doing it a long time.  I just wish they'd get rid of the name.
Even a nonsense word, perhaps selected out of Jabberwocky, would be a vast
improvement.  How about "software slithiness"?

* Without naming names, I'd suggest that the once frequent intense
disagreements between two long-time members of the list can be explained to
a large measure by this phenomenon.

** DO-178's "derived requirements" is the worst.  What's a "derived
requirement"?  A requirement that is *not* derived. I'll resist the
temptation to broaden "software community" to encompass safety/assurance
case work, where really bad terminology abounds.

*--cMh*

I used to think I was really good at imaging worst case scenarios.

On Tue, Sep 15, 2020 at 8:57 AM Derek M Jones <derek at knosof.co.uk> wrote:

> Nick,
>
> > As I recall, I have said before on this list, software has no wear out
> > mechanism so software reliability is somewhat meaningless.  I was widely
>
> There is no physical wear out mechanism, but the environment in which
> software
> is run can change (the same is true for hardware, but people don't tend to
> talk
> about this).
>
> > abused (some even said bullied) for suggesting that software reliability
> > was not the right way of thinking about software assurance.  It is
> > therefore with some trepidation that I dive into this thread.
>
> Some term has to be used.  An existing word comes with lots of baggage, but
> I don't see any overall benefit in trying to define a new word.
>
> The ANSI definition is encompasses what needs to be said:
> "Software Reliability is defined as: the probability of failure-free
> software operation for a specified period of time
> in a specified environment."
>
> The "specified environment" is what tends to get ignored in most analysis.
> The Ariane A501 was a different environment and outside the bounds of
> prior analysis.
>
> It is very difficult to obtain data on the environment in which software
> runs.
>
> For instance, the number of reported faults would be expected to increase
> with number
> of users.  Where is the data?  I have managed to find a few very noisy
> datasets, and yes
> reported faults increases with users.
>
> The bi-exponential function keeps cropping up in fuzzing data.  It's very
> suggestive:
>
> https://shape-of-code.coding-guidelines.com/2017/12/12/the-shadow-of-the-input-distribution/
>
> >
> > Nick Tudor
> > Tudor Associates Ltd
> > Mobile: +44(0)7412 074654
> > www.tudorassoc.com
> >
> > *77 Barnards Green Road*
> > *Malvern*
> > *Worcestershire*
> > *WR14 3LR*
> > *Company No. 07642673*
> > *VAT No:116495996*
> >
> > *www.aeronautique-associates.com <http://www.aeronautique-associates.com
> >*
> >
> >
> > On Tue, 15 Sep 2020 at 09:46, Peter Bishop <pgb at adelard.com> wrote:
> >
> >> On 14/09/2020 15:04, Martyn Thomas wrote:
> >>
> >> Why are you completely dismissing software reliablity?
> >>
> >> Is it not the case that if you can tolerate a failure rate of once in
> 1000
> >> hours, 99% confidence through testing would take about 200 days to
> >> demonstrate (so long as the test environment is "sufficiently" like the
> >> future operating environment and you are able to detaect every failure
> >> correctly)?
> >>
> >> And statistical testing is used in the UK nuclear industry fore safety
> >> critical systems, so it is not just abstract theory,
> >>
> >> Re your characterisation of confidence based statistical testing on P153
> >> (with no reference), I do not think it is fair to dismiss this because
> "p
> >> can vary by orders of magnitude". Testing presumes a fixed operational
> >> profile and a constant probability of failure.
> >>
> >> There has also been some work on the impact of profile change on the
> bound
> >> that can be claimed.
> >>
> >>
> >>
> https://www.researchgate.net/publication/307555914_Deriving_a_frequentist_conservative_confidence_bound_for_probability_of_failure_per_demand_for_systems_with_different_operational_and_test_profiles
> >>
> >> BTW, re, your summary of my paper on the same page, I think you missed
> the
> >> main point. This is a* predictive* theory to derive a worst case bound
> >> for some time in the future, i.e.
> >>
> >> Given N faults what is the worst possible reliability  at some future
> time
> >> T?
> >> - it assumes fault fixing  will occur during that time.
> >>
> >> You also only presented the theory of N=1, and you seem to assume the T
> >> has already happened with zero failures (not a requirement for this
> model)
> >>
> >> Might have been better to reference the original worst case bound
> version
> >> (which makes it clear that it is a long term forward prediction)
> >>
> >>
> >>
> https://www.researchgate.net/publication/3152200_A_conservative_theory_for_long-term_reliability-growth_prediction
> >>
> >> Of course, the testing would have to be repeated following a change to
> the
> >> software, unless you have enough formality to show that the change
> cannot
> >> affect reliability.
> >>
> >> In specific circumstances, you can do better than this. Bev Littlewood's
> >> published papers provide strong evidence and a rich bibliography. Bev's
> >> paper on "How reliable is a program that has never failed?" offers a
> useful
> >> rule-of-thumb: that aften n hours of fault free operation, there is
> about
> >> 50% chance of a failure in the following n hours (subject to some
> obvious
> >> constraints).
> >>
> >> The difficulties rapidly escalate when you need 10^-4 or better at >90%
> >> confidence.
> >>
> >> Martyn
> >> On 14/09/2020 14:14, SPRIGGS, John J wrote:
> >>
> >> In my experience, if Software Reliability is mentioned at a conference,
> at
> >> least one member of the audience will laugh, and if it is mentioned in a
> >> work discussion, at least one member of the group will get angry.
> >>
> >> Interestingly, some of the same people who say it is impossible to
> >> quantify software failure rates will set numerical requirements for
> >> Software Availability – if you get one of those, ask the Customer how
> (s)he
> >> wants you to demonstrate satisfaction of the requirement.
> >>
> >>
> >>
> >> John
> >>
> >> *From:* systemsafety <
> systemsafety-bounces at lists.techfak.uni-bielefeld.de>
> >> <systemsafety-bounces at lists.techfak.uni-bielefeld.de> *On Behalf Of
> *Derek
> >> M Jones
> >> *Sent:* 14 September 2020 12:54
> >> *To:* systemsafety at lists.techfak.uni-bielefeld.de
> >> *Subject:* [SystemSafety] What do we know about software reliability?
> >>
> >>
> >>
> >> All,
> >>
> >> What do we know about software reliability?
> >>
> >> The answer appears to be, not a lot:
> >>
> >>
> http://shape-of-code.coding-guidelines.com/2020/09/13/learning-useful-stuff-from-the-reliability-chapter-of-my-book/
> >>
> >> --
> >> Derek M. Jones Evidence-based software engineering
> >> tel: +44 (0)1252 520667 blog:shape-of-code.coding-guidelines.com
> >> _______________________________________________
> >> The System Safety Mailing List
> >> systemsafety at TechFak.Uni-Bielefeld.DE
> >> Manage your subscription:
> >> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
> >>
> >>
> >> ------------------------------
> >> If you are not the intended recipient, please notify our Help Desk at
> >> Email Information.Solutions at nats.co.uk immediately. You should not copy
> >> or use this email or attachment(s) for any purpose nor disclose their
> >> contents to any other person.
> >>
> >> NATS computer systems may be monitored and communications carried on
> them
> >> recorded, to secure the effective operation of the system.
> >>
> >> Please note that neither NATS nor the sender accepts any responsibility
> >> for viruses or any losses caused as a result of viruses and it is your
> >> responsibility to scan or otherwise check this email and any
> attachments.
> >>
> >> NATS means NATS (En Route) plc (company number: 4129273), NATS
> (Services)
> >> Ltd (company number 4129270), NATSNAV Ltd (company number: 4164590) or
> NATS
> >> Ltd (company number 3155567) or NATS Holdings Ltd (company number
> 4138218).
> >> All companies are registered in England and their registered office is
> at
> >> 4000 Parkway, Whiteley, Fareham, Hampshire, PO15 7FL.
> >> ------------------------------
> >>
> >> _______________________________________________
> >> The System Safety Mailing Listsystemsafety at TechFak.Uni-Bielefeld.DE
> >> Manage your subscription:
> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
> >>
> >>
> >> _______________________________________________
> >> The System Safety Mailing Listsystemsafety at TechFak.Uni-Bielefeld.DE
> >> Manage your subscription:
> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
> >>
> >> --
> >>
> >> Peter Bishop
> >> Chief Scientist
> >> Adelard LLP
> >> 24 Waterside, 44-48 Wharf Road, London N1 7UX
> >>
> >> Email: pgb at adelard.com
> >> Tel:  +44-(0)20-7832 5850
> >>
> >> Registered office: 5th Floor, Ashford Commercial Quarter, 1 Dover
> Place, Ashford, Kent TN23 1FB
> >> Registered in England & Wales no. OC 304551. VAT no. 454 489808
> >>
> >> This e-mail, and any attachments, is confidential and for the use of
> >> the addressee only. If you are not the intended recipient, please
> >> telephone 020 7832 5850. We do not accept legal responsibility for
> >> this e-mail or any viruses.
> >>
> >> _______________________________________________
> >> The System Safety Mailing List
> >> systemsafety at TechFak.Uni-Bielefeld.DE
> >> Manage your subscription:
> >> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
> >
> >
> > _______________________________________________
> > The System Safety Mailing List
> > systemsafety at TechFak.Uni-Bielefeld.DE
> > Manage your subscription:
> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
> >
>
> --
> Derek M. Jones           Evidence-based software engineering
> tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
> Manage your subscription:
> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20200915/9303fd76/attachment-0001.html>