[SystemSafety] Does "reliable" mean "safe" and or "secure" or neither?

Mon Apr 25 09:46:23 CEST 2016

On 24/04/2016 20:11, Roberto Bagnara wrote:
> On 24/04/2016 18:36, Michael J. Pont wrote:
>> Roberto asks:
>>
>> "Can we talk about the reliability of the components in the context
>>   of the overall system, without any knowledge about how they implement
>>   their functionality (e.g., hardware only, hardware + little bit of
>>   software, hardware + lots of software, hardware + software + humans)?"
>>
>> If our definition of reliability is something like this (from my previous
>> email):
>>
>> "the extent to which an experiment, test, or measuring procedure yields the
>> same results on repeated trials"
> 
> OK.

Here is the complete argument I was trying to make (slightly revised).
I would be grateful to those who will indicate the flaws in it.

Trying to match Michael's definition of reliability, in my example a "trial"
would consist in exercising the overall system under in-spec conditions.
By "in-spec conditions" I mean all conditions for which the specification
of the overall system has something to say: these include normal
conditions and some abnormal conditions (in which the overall system
is still expected to behave gracefully, so to speak).  Whenever I use
the terms "in-spec" and "out-of-spec" below the same considerations
apply.

Suppose the overall system is composed by a number of interacting
components.  Suppose also that such components are black boxes: we
cannot look inside them (for the time being).  However, we know
everything about the interactions between the components because we
can monitor them with precision.  Suppose we also have specifications
of each component that are detailed enough so that, in case of system
exhibits and out-of-spec behavior, we are able to point the finger at
small sets of components and tell which component(s) originated the
first out-of-spec behavior, which component(s) that were meant to
mitigate this misbehavior failed to do so, and so on.

The "experiment" would consist in recording the in-spec and
out-of-spec behaviors of the various system components.  We would say
that two outcomes are "the same result" if they are either both
in-spec or both out-of-spec.  We perform many "repeated trials" and we
thus determine the "reliability" of each component in the context of
the overall system.

At this point, I would say we are entitled to talk about the reliability
of each component in the context of the overall system.

Now, suppose we find that a particular component C, currently implemented
by black box A, is not reliable enough, i.e., during the repeated trials
has exhibited a number of out-of-spec behaviors that is too high.
We thus replace black box A with black box B, a different implementation
of the same component C.  And now the results are much better.

At this point, I would say we are entitled to say that B is more reliable
than A in the context in which they operate, even if we have no idea what
is in the boxes.

So we open black boxes A and B: inside we find both hardware and software.
I take it for granted that this discovery does not invalidate our previous
conclusion that B is more reliable than A in the context in which they
operate.

We examine the hardware in boxes A and B and find that A and B are based
on exactly the same hardware: there is no difference whatsoever.

At this point, I would say we are entitled to say that B's software
is more reliable than A's software in the context in which both operate
(which includes the hardware on which they both run).

In order to conclude, I think it is enough to show that what I have
described can happen in practice.  The first thing that comes to
mind concerns hardware bit flips due to radiation: they do happen
all the time and, given that A and B have the same hardware, the hardware
of A and B is affected in the same way.  However, the software of B,
differently from the software of A, extensively uses variable
mirroring: it keeps two or more copies of the same variable using
different data representations for the different copies (e.g., one
copy holds the bitwise negation of another copy).  In this way B's
software can detect and sometimes correct the effect of bit flips.

Kind regards,

    Roberto

-- 
     Prof. Roberto Bagnara

Applied Formal Methods Laboratory - University of Parma, Italy
mailto:bagnara at cs.unipr.it
                              BUGSENG srl - http://bugseng.com
                              mailto:roberto.bagnara at bugseng.com