[SystemSafety] Software reliability (or whatever you would prefer to call it)

Martin Lloyd martin.farside at btinternet.com
Sat Mar 14 15:45:43 CET 2015


Dear Colleagues

The issue of configuration data also goes beyond data held in memory. 
For example, the encoding of position on certain mechanical/robot 
position controllers can be implemented by laser engraving the strip of 
code on to a linear or circular stainless steel part. Thus, in dealing 
with failure mechanisms we are concerned with:

  * The software used to control the engraving process and the data
    which is engraved on the metal part
  * Mechanical and optical failures that could compromise the reading of
    the code
  * Wear or damage to the code or its obscuration by vapours, chemicals,
    etc.
  * Embedded software and electronics failures that could lead to
    misinterpretation of the code
  * Run time safety checking such as alternative position calculation
    algorithms

Perhaps we should say that design reliability concerns "everything that 
is the case" in the system under consideration.

Kind regards

Martin


On 14/03/2015 14:16, Littlewood, Bev wrote:
> Hi Alvery
>
> You are quite right. The issue of design reliability is much wider 
> than simply software. I think we tend to obsess about the design 
> problem with software because there’s a sense in which software is 
> “only” design. But there are certainly hardware systems where this 
> issue of design reliability clearly needs to be addressed, because of 
> their complexity. It’s seductive, but wrong, to claim that you have 
> addressed reliability in such cases by consideration only of what 
> Michael has called “...failures (using the term loosely) resulting 
> from physical degradation over time…”
>
> Of course, at the system level, we need to take account of both of 
> these - “systematic” and “non-systematic”. That’s why I pointed out 
> earlier, in my long posting, how important it was that the /measures/, 
> etc, we use for the “systematic” side can be the same (or similar) to 
> the ones we use for the “non-systematic” side - even though the 
> detailed mechanisms of failure differ. That way we get a holistic 
> quantitative treatment for the reliability (and safety) of systems 
> comprising all the elements you mention.
>
> I think you are bang on with your point 2. below, too. I’ve seen an 
> example of a critical system where there was huge controversy 
> concerning “the software”, but everyone seemed rather sanguine about 
> the massive amounts of configuration data.
>
> Cheers
>
> Bev
>
>
>> On 10 Mar 2015, at 15:32, GRAZEBROOK, Alvery N 
>> <Alvery.Grazebrook at airbus.com <mailto:Alvery.Grazebrook at airbus.com>> 
>> wrote:
>>
>> Hi Bev.
>>
>> Thanks for addressing the issue of language / terminology.
>>
>> In the world of embedded control systems, I have seen various 
>> attempts to dodge standards for design, by playing with the semantics 
>> around the word “Software”. There are two specific classes of dodging 
>> I can think of,
>>
>> 1.– using programmable electronics or high-state digital circuitry 
>> and claiming that software design practices don’t apply. In civil 
>> aero world they introduced DO-254 in addition to DO-178 to cover this.
>>
>> 2. – using data tables to describe behaviour, and claiming that only 
>> the table interpreter not the contents are software.
>>
>> I’m sure list members will think of other examples. If the language 
>> of the standards talked of “system behaviour” or “design behaviour” 
>> including Software, I think this would remove such issues.
>>
>> My feeling is that it would be helpful to talk of “complex design” 
>> including the software, attached electronics, and if applicable 
>> complexities in the controlled equipment and “plant”, and consider 
>> the (systematic) design reliability of all of this. Separating the 
>> part that is labelled as “software” from its electronic and physical 
>> world context isn’t helpful.
>>
>> This sits alongside the “traditional” component reliability 
>> approaches that deal with the (non-systematic) failure of equipment 
>> due to limited life, damage, random failure etc.
>>
>> **Note: these are my personal opinions, not necessarily those of my 
>> employer**
>>
>> Cheers,
>>
>>             Alvery.
>>
>> *From:*systemsafety-bounces at lists.techfak.uni-bielefeld.de 
>> <mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de> 
>> [mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de] *On 
>> Behalf Of *Littlewood, Bev
>> *Sent:* 10 March 2015 11:45 AM
>> *To:* C. Michael Holloway
>> *Cc:* systemsafety at lists.techfak.uni-bielefeld.de 
>> <mailto:systemsafety at lists.techfak.uni-bielefeld.de>
>> *Subject:* Re: [SystemSafety] Software reliability (or whatever you 
>> would prefer to call it)
>>
>> Hi Michael
>>
>> Seems you /are/ speaking for Nick! (see his most recent posting) Of 
>> course the distinction you make here is an important one - I think we 
>> can all agree on that. Not least because our actions in response to 
>> seeing failures from them will be different (in the case of design 
>> faults - inc. software faults - we might wish to remove the offending 
>> fault).
>>
>> But excluding design faults as a source of (un)reliability results in 
>> a very restrictive terminology. I realise that appealing to “common 
>> sense” in a technical discussion is often the last refuge of the 
>> scoundrel… But I don’t think that the man in the street, 
>> contemplating his broken-down car (in the rain - let’s pile on the 
>> pathos!), would be comforted to be told it was not unreliable, it 
>> just had /design/ faults.
>>
>> And, of course, your interpretation seems to rule out the 
>> contribution of human fallibility (e.g. pilots) to the reliability 
>> and/or safety of systems. This seems socially unacceptable, at least 
>> to me.
>>
>> Cheers
>>
>> Bev
>>
>>     On 10 Mar 2015, at 10:34, C. Michael Holloway
>>     <c.m.holloway at nasa.gov <mailto:c.m.holloway at nasa.gov>> wrote:
>>
>>     I can't speak for Nick, but I object to the use of the term
>>     "reliability" being applied to anything other than failures
>>     (using the term loosely) resulting from physical degradation over
>>     time.  I believe it is important to maintain a clear distinction
>>     between undesired behavior designed into a system, and undesired
>>     behavior that arises because something ceases to function
>>     according to its design.  (Here "designed / design" is used
>>     broadly.  It includes all intellectual activities from
>>     requirements to implementation.)
>>
>>     --
>>
>>     /*c*//*M*//*h*/
>>
>>     *C. Michael Holloway*
>>     The words in this message are mine alone; neither blame nor
>>     credit NASA for them.
>>
>>     On 3/10/15 5:50 AM, Peter Bishop wrote:
>>
>>         Now I think I understand your point.
>>         You just object to the term *software* reliability
>>
>>         If the term was *system* reliability in an specified
>>         operational environment, and the system contained software
>>         and the failure was always caused by software
>>         - I take it that would be OK?
>>
>>         A alternative term like *software integrity* or some such
>>         would be needed to describe the property of being correct or
>>         wrong on a given input.
>>         (In a lot of mathematical models this is represented as a
>>         "score function" that is either true or false for each
>>         possible input)
>>
>>         Peter Bishop
>>
>>         Nick Tudor wrote:
>>
>>         Now back in the office...for a short while.
>>
>>         Good point David - well put.
>>         I would have responded: There exists a person N who knows a
>>         bit about mathematics.  Person N applies some mathematics and
>>         asserts Truth. Unfortunately, because of the incorrect
>>         application of the mathematics, the claims N now makes cannot
>>         be relied upon.  The maths might well be correct, but the
>>         application is wrong because - and I have to say it yet again
>>         - the application misses fails to acknowledge that it is the
>>         environment that is random rather than the software. Software
>>         essentially boils down to a string of one's and nought's.
>>         Given the same inputs (and that always comes from the chaotic
>>         environment) then the output will always be the same.  It
>>         therefore makes no sense to talk about 'software reliability'.
>>
>>         Nick Tudor
>>         Tudor Associates Ltd
>>         Mobile: +44(0)7412 074654
>>         www.tudorassoc.com <http://www.tudorassoc.com/>
>>         <http://www.tudorassoc.com> <http://www.tudorassoc.com/>
>>         *
>>         *
>>         *77 Barnards Green Road*
>>         *Malvern*
>>         *Worcestershire*
>>         *WR14 3LR**
>>         Company No. 07642673*
>>         *VAT No:116495996*
>>         *
>>         *
>>         *www.aeronautique-associates.com
>>         <http://www.aeronautique-associates.com/>
>>         <http://www.aeronautique-associates.com>
>>         <http://www.aeronautique-associates.com/>*
>>
>>         On 9 March 2015 at 12:26, David Haworth
>>         <david.haworth at elektrobit.com
>>         <mailto:david.haworth at elektrobit.com>
>>         <mailto:david.haworth at elektrobit.com>
>>         <mailto:david.haworth at elektrobit.com>> wrote:
>>
>>             Peter,
>>
>>             there's nothing wrong with the mathematics, but I've got
>>             one little nit-pick about its application in the real world.
>>
>>             The mathematics you describe gives two functions f and g,
>>             one of which is the model, the other is the implementation.
>>
>>             In practice, your implementation runs on a computer and
>>         so the
>>             domain and range are not "the continuum". If your model
>>         is mathematical
>>             (or even runs on a different computer), the output of one
>>         will
>>             necessarily be different from the output of the other. That
>>             may not be a problem in the discrete sense - you simply
>>         specify a
>>             tolerance t > 0 in the form of:
>>
>>             Corr-f-g(i) = 0 if and only if |f(i)-g(i)| < t
>>
>>             etc.
>>
>>             The problem becomes much larger in the real world of control
>>             systems where the output influences the next input of the
>>             sequence. The implementation and the model will tend to
>>         drift
>>             apart. In the worst case what might be nice and stable in
>>         the
>>             model might exhibit unstable behaviour in the
>>         implementation.
>>
>>             You're then in the subject of mathematical chaos, where a
>>             perfectly deterministic system exhibits unstable and
>>         unpredictable
>>             behaviour. However, this email is too small to describe
>>         it. :-)
>>
>>             Cheers,
>>             Dave
>>
>>             On 2015-03-09 11:48:57 +0100, Peter Bernard Ladkin wrote:
>>              > Nick,
>>              >
>>              > Consider a mathematical function, f with domain D and
>>         range R.
>>             Given input i \in D, the output is f(i).
>>              >
>>              > Consider another function, g, let us say for
>>         simplicity with the
>>             same input domain D and range R.
>>              >
>>              > Define a Boolean function on D, Corr-f-g(i):
>>              >
>>              > Corr-f-g(i) = 0 if and only if f(i)=g(i);
>>              > Corr-f-g(i) = 1 if and only if f(i) NOT-EQUAL g(i)
>>              >
>>              > If X is a random variable taking values in D, then
>>         f(X), g(X) are
>>             random variables taking values in
>>              > R, and Corr-f-g(X) is a random variable taking values
>>         in {0,1}.
>>              >
>>              > If S is a sequence of values of X, then let
>>         Corr-f-g(S) be the
>>             sequence of values of Corr-f-g
>>              > corresponding to the sequence S of X-values.
>>              >
>>              > Define Min-1(S) to be the least place in Corr-f-g(S)
>>         containing a
>>             1; and to be 0 if there is no such
>>              > place.
>>              >
>>              > Suppose I construct a collection of sequences S.i,
>>         each of length
>>             1,000,000,000, by repeated
>>              > sampling from Distr(X). Suppose there are 100,000,000
>>         sequences I
>>             construct.
>>              >
>>              > I can now construct the average of Min-1(S) over all the
>>             1,000,000,000sequences S.i.
>>              >
>>              > All these things are mathematically well-defined.
>>              >
>>              > Now, suppose I have deterministic software, S. Let
>>         f(i) be the
>>             output of S on input i. Let g(i) be
>>              > what the specification of S says should be output by S
>>         on input
>>             i. Corr-f-g is the correctness
>>              > function of S, and Mean(Min-1(S)) will likely be very
>>         close to
>>             the mean time/number-of-demands to
>>              > failure of S if you believe the Laws of Large Numbers.
>>              >
>>              > I have no idea why you want to suggest that all this is
>>             nonsensical and/or wrong. It is obviously
>>              > quite legitimate well-defined mathematics.
>>              >
>>              > PBL
>>              >
>>              > Prof. Peter Bernard Ladkin, Faculty of Technology,
>>         University of
>>             Bielefeld, 33594 Bielefeld, Germany
>>              > Je suis Charlie
>>              > Tel+msg +49 (0)521 880 7319 www.rvs.uni-bielefeld.de
>>         <http://www.rvs.uni-bielefeld.de/>
>>         <http://www.rvs.uni-bielefeld.de>
>>         <http://www.rvs.uni-bielefeld.de/>
>>              >
>>              >
>>              >
>>              >
>>              > _______________________________________________
>>              > The System Safety Mailing List
>>              > systemsafety at TechFak.Uni-Bielefeld.DE
>>         <mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
>>         <mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
>>         <mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
>>
>>             --
>>             David Haworth B.Sc.(Hons.), OS Kernel Developer
>>         david.haworth at elektrobit.com
>>         <mailto:david.haworth at elektrobit.com>
>>         <mailto:david.haworth at elektrobit.com>
>>         <mailto:david.haworth at elektrobit.com>
>>             Tel: +49 9131 7701-6154 <tel:%2B49%209131%207701-6154>
>>         <tel:%2B49%209131%207701-6154> Fax:
>>             -6333                  Keys: keyserver.pgp.com
>>         <http://keyserver.pgp.com/>
>>         <http://keyserver.pgp.com> <http://keyserver.pgp.com/>
>>             Elektrobit Automotive GmbH Am Wolfsmantel 46, 91058
>>             Erlangen, Germany
>>             Geschäftsführer: Alexander Kocher, Gregor Zink      
>>         Amtsgericht
>>             Fürth HRB 4886
>>
>>             Disclaimer: my opinion, not necessarily that of my employer.
>>
>>         _______________________________________________
>>             The System Safety Mailing List
>>         systemsafety at TechFak.Uni-Bielefeld.DE
>>         <mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
>>         <mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
>>         <mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
>>
>>
>>
>>         ------------------------------------------------------------------------
>>
>>
>>         _______________________________________________
>>         The System Safety Mailing List
>>         systemsafety at TechFak.Uni-Bielefeld.DE
>>         <mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
>>
>>     _______________________________________________
>>     The System Safety Mailing List
>>     systemsafety at TechFak.Uni-Bielefeld.DE
>>     <mailto:systemsafety at TechFak.Uni-Bielefeld.DE>
>>
>> _______________________________________________
>>
>> Bev Littlewood
>> Professor of Software Engineering
>> Centre for Software Reliability
>> City University London EC1V 0HB
>>
>> Phone: +44 (0)20 7040 8420  Fax: +44 (0)20 7040 8585
>>
>> Email: b.littlewood at csr.city.ac.uk <mailto:b.littlewood at csr.city.ac.uk>
>>
>> http://www.csr.city.ac.uk/
>> _______________________________________________
>>
>> This email and its attachments may contain confidential and/or 
>> privileged information. If you have received them in error you must 
>> not use, copy or disclose their content to any person. Please notify 
>> the sender immediately and then delete this email from your system. 
>> This e-mail has been scanned for viruses, but it is the 
>> responsibility of the recipient to conduct their own security 
>> measures. Airbus Operations Limited is not liable for any loss or 
>> damage arising from the receipt or use of this e-mail. Airbus 
>> Operations Limited, a company registered in England and Wales, 
>> registration number, 3468788. Registered office: Pegasus House, 
>> Aerospace Avenue, Filton, Bristol, BS34 7PA, UK.
>>
>>   
>>
>
> _______________________________________________
>
> Bev Littlewood
> Professor of Software Engineering
> Centre for Software Reliability
> City University London EC1V 0HB
>
> Phone: +44 (0)20 7040 8420  Fax: +44 (0)20 7040 8585
>
> Email: b.littlewood at csr.city.ac.uk <mailto:b.littlewood at csr.city.ac.uk>
>
> http://www.csr.city.ac.uk/
> _______________________________________________
>
>
>
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE

-- 
Kind regards

Martin Lloyd


===========================
Dr M H Lloyd CEng FIET
martin.farside at btinternet.com

Tel: 	+44(0)118 941 2728
Mobile: +44(0)786 697 6840

============================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20150314/186610ef/attachment-0001.html>


More information about the systemsafety mailing list