[SystemSafety] Post Office Horizon System

Fri Jan 8 17:29:25 CET 2021

Derek, 

Yes, of course I agree that detailed tracking of changes to database records is necessary; and a comprehensive audit log allows diagnosis of failures after they have happened. 

My point was narrower: not-visible-in-the-code events are not visible in the code and cannot be easily made visible there. 

-- Michael

> On 7 Jan 2021, at 21:22, Derek M Jones <derek at knosof.co.uk> wrote:
> 
> Michael,
> 
>> If the code is designed on the assumption that a value assigned by statement s to a particular database record variable will remain unchanged until another assignment is executed in the code by statement t, then a fault experience can occur if the variable value is changed during a 'not-visible-in-the-code' event of the kind I mentioned in my email. It is obviousy not simple to check for all such possibilities by adding further code (such as assertion checks: quite apart from any other consideration, there may be an infinite regress).
> 
> Database applications are an example of multiple programs
> potentially changing the contents of the same database.
> 
> Detailed change tracking requires the underlying database manager
> to maintain a detailed audit log.
> 
> Horizon had a 'fix-mistakes' program for patching the database
> when something when wrong.  It was revealed in later Horizon trials
> that there was no audit log of such changes.
> 
> In other situations I have been people type SQL from the
> command line to fix mistakes, and I have done it myself on test
> databases (I have never worked as a dba on a live system).
> 
>> -- Michael
>>> On 7 Jan 2021, at 18:48, Derek M Jones <derek at knosof.co.uk> wrote:
>>> 
>>> Michael,
>>> 
>>>> ... In the Horizon case, can we consider only the likelihood of a 'coding mistake' in the progam texts? This, surely, is like analysing a rail crash by examining only the code of the interlocking system's programs. The fault may lie elsewhere.
>>> 
>>> Two things are needed for a fault experience to occur.
>>> 
>>> 1) a mistake in the code,
>>> 
>>> 2) the 'right' input value(s).
>>> 
>>> Nearly all research focuses on (1) because the information is
>>> readily available.
>>> 
>>> The likelihood of the 'right' input values occurring will depend on the
>>> quantity of input values and the variability in these values.
>>> 
>>> There are techniques that can be used to estimate certain kinds of (1),
>>> given information on fault experiences (assumptions are made about the
>>> distribution of (2):
>>> http://shape-of-code.coding-guidelines.com/2018/03/18/estimating-the-number-of-distinct-faults-in-a-program/
>>> 
>>> I don't know of any techniques of estimating (2), and this
>>> looks really difficult.  One possibility is counting users
>>> and trying to estimate the variability in their usage.
>>> 
>>> -- 
>>> Derek M. Jones           Evidence-based software engineering
>>> tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com
>>> _______________________________________________
>>> The System Safety Mailing List
>>> systemsafety at TechFak.Uni-Bielefeld.DE
>>> Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
> 
> -- 
> Derek M. Jones           Evidence-based software engineering
> tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
> Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety