[SystemSafety] Software reliability (or whatever you would prefer to call it)

Tue Mar 10 09:49:44 CET 2015

Here's a different view on software reliability and an example.

We know that:

1. We /can/ write software that is very well defined and does not exhibit  
any stochastic behaviour.

2. We /can/ also intentionally (or unintentionally) write software that  
does exhibit unpredictable failure behaviour, which can be characterized  
using statistical techniques (and therefore called stochastic behaviour).  
You can achieve this through the use of random number generators for  
example. (1)

The challenge, as software grows in size and complexity, is the practical  
difficulty in writing software (like 1) that is so well defined and  
verified that it does not exhibit the stochastic failure behaviour (of 2).

Indeed, at some point in the size/complexity scale, the development and  
verification of fully deterministic software will become a practical  
impossibility and therefore we have little other option than to use some  
statistical metric of confidence that we have achieved the goal of no  
failure.

One example of this that is developing traction is the PROXIMA EU project,  
which is specifically focused on software timing for multi-core  
processors. The basic idea is that for very complex hardware/software  
systems, it is beyond practical feasibility to understand the worst case  
execution time of the software. ("How can you possibly have  
tested/analysed sufficient inputs, initial states, and the impact from  
other cores to give a bound which is both accurate and  
*practically/economically small enough*.")

The direction in this project is to intentionally produce a system that is  
designed to have a stochastic timing behaviour at the low level. And by  
doing so, you can then legitimately start to use all kinds of statistical  
methods that are not available to a digital system normally.

Therefore, you have a software computation that has a probability of  
failing to produce its result within its allotted time. However, you also  
have a reliable method of computing that probability, which can be well  
below the oft-quoted 10^-9/hour.

Ian

(1) [You could also map a partially testable massive input domain to a  
random-number generator, or consider race conditions driven by apparently  
randomly timed input data and the like].

-- 
Dr Ian Broster, General Manager
Rapita Systems Ltd
Tel: +44 1904 413 945 Mob: +44 7963 469 090

Stay informed by joining the Rapita Systems mailing list
http://www.rapitasystems.com/contact/mailing_list

For real-time verifications issues and discussion, follow the Rapita Systems blog
http://www.rapitasystems.com/blog
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20150310/852df279/attachment.html>