[SystemSafety] RE : Qualifying SW as "proven in use" - unclassified

Fri Jun 28 10:13:32 CEST 2013

One thing that seems to be being ignored(?) is that safety functions
fall into two camps - continuous and on demand.  These may each have
different lifetime exposure to the input range statistics.

By intention most people intend that on demand safety functions are
rarely, if ever, invoked.  Invoking of an on demand safety function is
usually the result of some other set of failures which are also intended
to be rare.  This means that any evidence for correct operation of the
on demand safety function is, by design, rare and evidence for incorrect
(ie fails on demand) operation is, hopefully, even rarer.  

Most continuous safety systems will see most use with input conditions
away from the area where safety issues may arise or will be operating in
the sweet spot of 'normal' operation.  So once again the input space
statistics will tend to show low frequencies around the space of
(safety) interest.

All of this might suggest that in many cases safety claims based on
'proven in use' may reflect operational practices better than
performance of the safety functions.  There may be more evidence about
'fails safe' than 'fails dangerous'.

My opinion - not necessarily that of my employer!

Martin King 

________________________________

From: systemsafety-bounces at techfak.uni-bielefeld.de
[mailto:systemsafety-bounces at techfak.uni-bielefeld.de] On Behalf Of
Matthew Squair
Sent: 28 June 2013 08:16
To: Peter Bernard Ladkin; Bielefield Safety List
Subject: Re: [SystemSafety] RE : Qualifying SW as "proven in use"

Peter, 

Yep it was my poor choice of phrase, my point was that in terms of
evidence one or a thousand hours of data from the old environment should
have the same evidential weight, if the new environment is different to
the old and we have no idea of what it is.

Yes I would agree that stochastic inputs can generate stochastic
behaviour. So if it's inputs we're talking about isn't the use of 'hours
run' as a unit of exposure essentially a side issue? If what you're
actually doing is exposing the software to a set of inputs that are
stochastic in nature. As a consequence the amount of time you have to
assign to collect a statistically valid sample is driven by the
confidence you wish to obtain, the inherent variability of the input,
and how frequently it arrives over some period of time. 

Going back to my example if I 'know' that the variability of the inputs
is extremely low one hour of input data may do, if not then much more
may be needed. If the input data is very complex, any amount of hours
may not be enough. All of which is about establishing how well we
understand the environment rather than the 'reliability' of software, as
I see it.

Picking up your example again, why was the difference between the two
environments not detected by the designer? Was an assumption made that
the new environment was the same as the old? Presuming that to be the
case, isn't the decision to deploy software in that new context really
about a different kind of uncertainty? 

For example, the original environment had an arrival rate of inputs that
could be characterised to have some frequency, the new environment  also
has a frequency but we are uncertain as to it's value. We could have
estimated some bounds to this possible range of frequencies and run some
tests to see what effect differing arrival rates might have, or we could
have gone out and gathered field data, but instead (I presume) we
elected to assume that the parameters were the same. 

So deploying into a new environment carries epistemic uncertainty and we
can reduce this, but if we make an assumption that the environment is
the same we are translating that epistemic uncertainty into an
ontological one. I infer from your example that we didn't have to wait
too long after deployment to find this problem so I presume that we
wouldn't have to run a trial for very long before we saw the problem
input.

As to whether you would or should weigh operation in multiple different
environments as better or worse I was thinking about the open source
example, where having multiple different people looking at the code
independently seems to generate very low defect rates. Linux is the
example a lot of people use I believe. So, couldn't one argue that
operation across a range of different environments would be more likely
to expose different systematic errors, as compared to operation in one
environment for a long time?

On Thu, Jun 27, 2013 at 9:35 PM, Peter Bernard Ladkin
<ladkin at rvs.uni-bielefeld.de> wrote:

	Matthew,

	Scenarios such as those Bertrand describes are not that
far-fetched. Unfortunately, there are in some places senior management
who are in the same state of (lack of) expertise as Bertrand describes.
That is a problem of professional qualification which I would prefer to
treat as a separate issue.

	On 27 Jun 2013, at 09:18, Matthew Squair <mattsquair at gmail.com>
wrote:
	> I've been thinking about Peter's example a good deal, the
developer seems to me to have made an implicit assumption that one can
use a statistical argument based on sucessful hours run to justify the
safety of the software.

	It is not an assumption. It is a well-rehearsed statistical
argument with a few decades of universal acceptance, as well as various
successful applications in the assessment of emergency systems in
certain English nuclear power plants.

	> I don't think that's true,

	You might like to take that up with, for example, the editorial
board of IEEE TSE.

	> in fact I'd go further and say that whether you operate for a
thousand hours or a million hours has no bearing on demonstrating
software safety, because what we're interested in are systematic
failures rather than random ones.

	I presume you would want to argue that the occurrence of a
failure caused by a systematic fault is functionally dependent on the
inputs, and that is what distinguishes it from what you call "random".
However, if your inputs have a stochastic nature, then anything
functionally dependent on them will also exhibit stochastic behavior.
Failures caused by systematic faults  thus exhibit stochastic behavior.

	> Example, I have a piece of software and (despite my best
efforts) there's a latent fatal fault within it, however testing hasn't
discovered it and I'm also in luck in that the operating environment is
sufficiently close to the test environment that the fault is not
triggered in the operating environment. Now I could run the system for
one, one hundred or a thousand years in that operating environment and I
wouldn't see a problem. So according to the statistical treatment the
software is safe, even with a fatal flaw isn't it?

	No. According to the statistical treatment, if you have seen 3 x
10^X operational hours without failure, *and* you are guaranteed to have
had perfect failure detection, *and* the future operating environment
has the exact same statistical properties as the previous (not "similar"
but exact, statistically), then you may be 90% confident that you will
see failures with a likelihood of not more than 10^(-X) per operating
hour. How that might relate to a claim that "the software is safe" is up
to you. Also, you didn't express what level of confidence you might need
in such a claim.

	> So logically if the number of hours you run in service in a
particular environment has nothing to do with proving the safety of
software, why couldn't I say that after one hundred hours the software
was 'proven in use', for that specific environment. Why not one hour?

	It is correct that the number of hours.... has nothing to do
with proving the safety of software, if by that you mean establish
without a shadow of doubt. Neither does any practical statistical
reasoning. Usual levels of confidence with statistical reasoning are
95%. Well away from certainty.

	You can of course say that, after 100 hours of failure-free
operation, the SW is "proven in use", whatever that might mean to you.
What you cannot do is attribute to that assertion any other than a very,
very low level of confidence. Even one hour. With an appropriately lower
level of confidence (= epsilon indistinguishable from zero, I would
hope).

	> In Peter's example the number of hours run on the original
software version could have been one, or ten million and there still
would have been the same end result, e.g a failure when put into a new
operational context. In other words one hour of operations has as much
weight as one thousand (in the same environment).

	I am not sure what you mean here. To me, "new operational
context" and "same environment" are contradictory,  so maybe I don't
understand the way you are using these terms.

	> Another question, say I have developed a piece of software,
it's now running in three quite different operating environments, in
terms of evidence of 'safety' would I weight 300 hours of operation in a
single environment the same as 100 hours from each of these different
environments? If so why?

	What you have is 100 hours of experience from each of three
different distributions. You could superimpose the distributions if you
want, but the only reason to do that is if you are thinking of deploying
the SW in an environment identical to that superimposition and want to
get a clue as to its viability.

	PBL

	Prof. Peter Bernard Ladkin, University of Bielefeld and Causalis
Limited

-- 
Matthew Squair 

Mob: +61 488770655 <tel:%2B61%20488770655> 
Email: MattSquair at gmail.com

The data contained in, or attached to, this e-mail, may contain confidential information. If you have received it in error you should notify the sender immediately by reply e-mail, delete the message from your system and contact +44 (0) 1332 242424 (the Rolls-Royce IT Security Director) if you need assistance. Please do not copy it for any purpose, or disclose its contents to any other person.

An e-mail response to this address may be subject to interception or monitoring for operational reasons or for lawful business practices.

(c) 2012 Rolls-Royce plc

Registered office: 65 Buckingham Gate, London SW1E 6AT Company number: 1003142. Registered in England. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20130628/8d3a01e1/attachment-0001.html>