[SystemSafety] Post Office Horizon System

Steve Tockey steve.tockey at construx.com
Mon Apr 26 20:26:34 CEST 2021


PBL,
Thanks for the detailed reply. I appreciate it.

Yes, in this case, there do appear to be a lot of unanswered questions before the root cause(s) is fully understood. So we can’t immediately blame the software. Yet. But the question still remains in the general case: more and more software is being delivered which is leading to material harm to the general public—when will the developers and/or suppliers of such faulty software be held accountable and liable?

Take, for example, the Uber self-driving car that killed Elaine Herzberg.  The following is directly from an article summarizing the NTSB report (linked below,underlining added by me for emphasis):

- - - - Quote - - - -

However, "if the perception system changes the classification of a detected object, the tracking history of that object is no longer considered when generating new trajectories," the NTSB reports.

What this meant in practice was that, because the system couldn't tell what kind of object Herzberg and her bike were, the system acted as though she wasn't moving.

>From 5.2 to 4.2 seconds before the crash, the system classified Herzberg as a vehicle and decided that she was "static"—meaning not moving—and hence not likely to travel into the car's path. A little later, the system recognized that she was moving but predicted that she would stay in her current lane.

When the system reclassified her as a bicycle 2.6 seconds before impact, the system again predicted that she would stay in her lane—a mistake that's much easier to make if you've thrown out previous location data. At 1.5 seconds before impact, she became an "unknown" object and was once against classified as "static."

It was only at 1.2 seconds before the crash, as she was starting to enter the SUV's lane, that the system realized a crash was imminent.

- - - - unquote - - - -

In spite of the fact that she was clearly moving WRT the vehicle on a course with constant bearing and decreasing range, they repeatedly re-classified her as static.
See: https://arstechnica.com/cars/2019/11/how-terrible-software-design-decisions-led-to-ubers-deadly-2018-crash/


Or, “The US car company General Motors is recalling more than four million vehicles worldwide due to a software defect linked to at least one death . . . the defect concerns the sensing and diagnostic module. In rare cases it can go into test mode, meaning airbags will not inflate in a crash.”
See: http://www.bbc.com/news/world-us-canada-37321361


Or, “the current software could result in high temperatures on certain transistors and possibly damage them. When it fails, the error forces the car into failsafe mode. Toyota says that in rare circumstances, it could even shut the hybrid system down while the car is being driven”
See: http://www.autoblog.com/2014/02/12/toyota-recalling-1-9m-prius-models-globally/


Or, “The outage was caused by a software coding error in the Colorado facility, and resulted in a loss of 911 service for more than 11 million people for up to six hours. … Although, fortunately, it appears that no one died as a result, the incident – and the flaws it revealed – is simply unacceptable.”
See: http://transition.fcc.gov/Daily_Releases/Daily_Business/2014/db1017/DOC-330012A1.pdf


Or, “A flaw found in a calculator tool used by doctors at GP surgeries has potentially led to a number of patients being erroneously prescribed or denied statins across England. … Due to the unidentified error in the code, it's possible that the risk of CVD was overstated … This could—in turn—have led to mistakes in prescriptions for statins”.
See http://arstechnica.co.uk/security/2016/05/bug-in-gp-heart-risk-calculatortool-tpp/


Or, “More than 3,000 inmates in Washington state prisons were released early because of a software bug. The glitch caused the computer system to miscalculate the sentence reduction inmates received for good behavior, according to a press statement from the state governor’s office.”
See: http://www.techinsider.io/washington-prisons-software-glitch-2015-12


Or, “The causes of the National Air Traffic Services (NATS) flight control centre system failure in December 2014 that affected 65,000 passengers directly and up to 230,000 indirectly have been revealed in a recently published report … How could an error not tolerated in undergraduate-level programming homework enter software developed by professionals over a decade at a cost approaching a billion pounds?”
See: https://theconversation.com/air-traffic-control-failure-shows-we-need-a-better-approach-to-programming-42496


I could, of course go on.


There is an organization here in the US called NSPE (National Society of Professional Engineers). From their web site, quote:

“Creating a world where the public can be confident that engineering decisions affecting their lives are made by qualified and ethically accountable professionals.”
See: https://www.nspe.org/

Also from their web site, quote (emphasis theirs):

“NSPE was established in 1934 to realize a simple but vital goal: create an inclusive, nontechnical organization dedicated to the interests of licensed professional engineers, regardless of practice area, that would protect engineers (and the public) from unqualified practitioners, build public recognition for the profession, and stand against unethical practices and inadequate compensation.”
See: https://www.nspe.org/membership/nspe-who-we-are-and-what-we-do

Why aren’t organizations like NSPE up in arms over all of the developed-by-highly-paid-amateurs software that is clearly putting the health, safety, and welfare of the public at significant risk? Like I said, until those that are directly causing material damage get held liable and accountable for their mistakes, we can only expect that nothing will change.


Regards,

— steve




-----Original Message-----
From: Peter Bernard Ladkin <ladkin at causalis.com<mailto:ladkin at causalis.com>>
Organization: RVS Bielefeld and Causalis
Date: Saturday, April 24, 2021 at 12:30 AM
To: Steve Tockey <Steve.Tockey at construx.com<mailto:Steve.Tockey at construx.com>>
Cc: "systemsafety at lists.techfak.uni-bielefeld.de<mailto:systemsafety at lists.techfak.uni-bielefeld.de>" <systemsafety at lists.techfak.uni-bielefeld.de<mailto:systemsafety at lists.techfak.uni-bielefeld.de>>
Subject: Re: [SystemSafety] Post Office Horizon System



On 2021-04-24 04:17 , Steve Tockey wrote:
I’m wondering why nobody seems to be considering holding the programmers who wrote
that code accountable. > Why aren’t those programmers sent to jail for equal time they caused the falsely
> accused? Why don’t those programmers have to pay the reimbursements?

Because they are not the people who wrote the contracts which stated that
subpostmasters were liable
to make good any branch bookkeeping shortfall, no matter how that shortfall may have happened. And
they are not the people who indulged in the abuse of process category 2 that sent people who jail
and ruined the livelihood of others. In both cases that would be the company legal department,
wouldn't it?

Had the subpostmaster contracts been different, and the company legal department not been as
aggressive towards their contractors, this might well have been just another complex distributed
system which took five to ten in-service years to debug.  Unsatisfactory UK government or
government-backed large IT projects are not an unknown phenomenon in the UK. Trying to push all the
failures onto the users, and succeeding (until late 2019), is, however, unprecedented.

As long as programmers who write crap code like that are not held accountable
for their obvious failures, why would anybody even hope for anything to
change
in how software is developed?

I don't think anybody knows at this stage that the code itself was unusually poor for such a system,
or, if so, why. The system itself was apparently described in a report prepared by the system
auditors Second Sight in 2013 as, in some cases "not fit for purpose". But the system was/is a lot
more than the code. As Michael Jackson has pointed out, there are all sorts of HW and devices
involved. Unless all those interfaces are well understood and monitored (and the traces recorded),
there are all kinds of things that can go wrong that are not necessarily caused by poor programming.

For example, consider phantom transactions. How did those happen? People suspect touch screens that
were physically not reliable, and recorded "touches" that never happened.
To figure out that such
things are possible, one needs close cooperation, and transparency, between hardware supplier and
system architects, as well as knowledge of the HW product that may not yet exist, especially if it
is new. How can you attribute any of that to programming? You need good post hoc error logging and
traceability down to the fault. That is a company process, not a programming speciality.

Such a large system needs good technical oversight during development. Ensuring such oversight is a
task for organisational theorists and auditing specialists, not for programmers.

Finally, before the system was deployed, in 1999, the government stopped the pilot project after
£700m pounds had been spent on it. It is not as if everything went swimmingly until deployment. It
is an issue of management and mismanagement of an exceptionally complex IT project. It is not a
matter for the IT supplier and its employees/subcontractors alone.

Leave the taxpayers out of it. They (we) are completely innocent. Hang those programmers—and their employer—out to dry. That will teach them. For once.

Many people involved feel that the supplier (ICL/Fujitsu) was not the main issue. The behaviour of
the client, Post Office Limited, was much more at issue (see above). That
entity went through many
organisational iterations during the time frame of Horizon and, in its current iteration, has
admitted it cannot shoulder the liability arising from the agreed compensation. So in that sense it
has already "hung [itself] out to dry."

However, public-facing Post Offices and the services they offer are socially far too important for
the daily life of millions of people in the UK for POL just to stop doing
business. It doesn't just
offer the public-facing services of post/parcel, but is also a channel for many social insurance
transactions (benefits payments and so on) and other government transactions (e.g., road vehicle tax
payment and receipt). It is too important to just stop all that, to fail.

There are various suggestions out there as to how to avoid such disasters. Where there are clear
interfaces, log the transaction-data items which pass through the interfaces. This has been done
with common Internet services since the beginning. Every mail server has a log of what has gone
through it and the handshaking that transpired. It is a matter of a few minutes for a sysadmin to
tell you what happened to your email. Stuff like that is a matter for system design, though, not
programming per se. Another suggestion is strict liability for harm (including financial loss)
resulting through use of such a SW system. Such a regime would surely have caused Horizon system
development to cease in 1999, if not before. But Horizon sort of now works. Would the UK really have
been better off without it for the last twenty years? Not necessarily. But certainly the country
would have been a lot better off without the aggressive attempts to blame
the users for all
problems, as the court of appeal established yesterday.

I imagine there are books and books and books full of lessons to be learned over the 25-year history
of this system. But they won't be written because of non-disclosure contracts and proprietary
interests (including those of the state), as well as the personal interests of some formerly "key
players". A public inquiry might manoeuvre around some of these hindrances, but will necessarily
stop short of anything which might point towards malfeasance or culpable negligence of individuals,
unless there is a general amnesty.

PBL

Prof. Peter Bernard Ladkin, Bielefeld, Germany
ClaireTheWhiteRabbit RIP
Tel+msg +49 (0)521 880 7319  www.rvs-bi.de






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20210426/fd0a2a39/attachment-0001.html>


More information about the systemsafety mailing list