[SystemSafety] Degraded software performance [diverged from Fault, Failure and Reliability Again]

Wed Mar 4 15:22:19 CET 2015

Drew,

Thanks for taking the time to compile this list.

I’d view the Example 1 to Example 3 as hardware-related faults.  

(If we don’t do so, then we would – I think – end up having to class a broken wire that connects a switch to our processor as a “software failure”).

In the case of Example 4, I agree that it could be argued that the software behaviour may change over time.  However, I’d view this as the consequence of a design or coding error rather than anything else – rather in the line of Matthew’s example from earlier.

At the end of the day, it may simply come down to definitions (and you, of course, you are free to define these things as you see fit).     

Michael.

From: DREW Rae [mailto:d.rae at griffith.edu.au] 
Sent: 04 March 2015 13:25
To: M.Pont at safetty.net
Cc: <systemsafety at lists.techfak.uni-bielefeld.de>
Subject: Degraded software performance [diverged from Fault, Failure and Reliability Again]

Michael,

I need to give more than one example, because the point is general, rather than specific to the individual causes. In each case the cumulative probability of software failure increases over time. 

1) Damage to the instruction set

e.g. the physical record of the instructions on a storage medium changes

very specific e.g. bit flip on a magnetic storage device holding the executable files

2) Increased unreliability of the physical execution environment

e.g. an increased rate of processor errors

very specific e.g. dust accumulates on part of the processor card, making it run hot and produce calculation errors

3) Increased unreliability of input hardware

e.g. software is required to detect and respond correctly to an increased rate and variety of sensor failure combinations

Note: This is the one that challenges "but we're running the software in exactly the same hardware environment". Hardware environments change as they get older.

4) Software accumulates information during runtime

e.g. a count of elapsed time

e.g. increasing volume of stored data

e.g. memory leak 

NB1: In all of these cases I've heard arguments "that's not the software, that's X". Those arguments are only relevant if you can control for X when collecting data for software reliability calculation. Software without an execution environment is a design. It "never fails" in the way that _no_ design fails. When it does fail, it is subject to the same degredation over time as any physical implementation 

NB2: I'm not claiming that failure due to physical degredation is significant compared to failure due to errors in the original instructions. I'm saying that we don't know, and that not knowing becomes a big issue once we've tested to the point of not finding errors in the original instructions. At that point, absent evidence to the contrary, we should be assuming that physical degredation is signficant. 

Drew

On 4 March 2015 at 12:27, Michael J. Pont <M.Pont at safetty.net> wrote:

Drew,

“The underlying point holds, that software _can_ exhibit degraded performance over time.”

Can you please give me a simple example of what you mean by this.

Thanks,

Michael.

_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20150304/20e86be6/attachment-0001.html>