[SystemSafety] [External] Re: Post Office Horizon System

Steve Tockey steve.tockey at construx.com
Fri Apr 30 21:56:56 CEST 2021


PBL wrote:

“How would you go about writing a requirements specification for this software?”

First, I would not consider what you are asking about to be functional requirements, so they would not be appropriate to even try to express in a Semantic Model. Semantic models are for the functional requirements only. Instead, I would categorize what you are asking about to be in what I call a “Quality of Service” subset of the nonfunctional requirements. These should then, IMHO, be expressed in an entirely different form.

Second, I think I understand what direction you are trying to go with this, but I suggest that you are still quite far away from actually getting to the point of true requirements. Specifically, assuming that a goal of this software is to answer the question, “Given a CRT scan, does this patient have myocarditis?”, a number of critical questions need to be answered. For example:

*) What is the maximum acceptable false positive rate for the software's diagnosis of myocarditis when the patient doesn’t actually have it?
*) What is the maximum acceptable false negative rate for the software's diagnosis of no myocarditis when the patient does actually have it?
*) What is the maximum acceptable rate of “Not sure” diagnoses from the software?
*) What is the maximum acceptable false positive rate for an experienced physician's diagnosis of myocarditis when the patient doesn’t actually have it?
*) What is the maximum acceptable false negative rate for an experienced physician's diagnosis of no myocarditis when the patient does actually have it?
*) What is the maximum acceptable rate of "Not sure" diagnoses for experienced physicians?
*) Is there any rational justification for the software to perform at a different level than experienced physicians?
*) What is an maximum acceptable rate of disagreement between the software’s diagnosis and an experienced physician’s diagnosis of the same CRT scan?
*) What about a CRT scan where two or more experienced physicians would have conflicting diagnoses, what is the software expected to do?

The core issue surrounds how well the software correctly diagnoses myocarditis, so the requirements need to be expressed around this “Quality of Service”. Purely for sake of argument, assume that:

*) The maximum acceptable false positive rate for an experienced physician's diagnosis of myocarditis when the patient doesn’t actually have it is 0.5%
*) The maximum acceptable false negative rate for an experienced physician's diagnosis of no myocarditis when the patient does actually have it is 0.05%
*) The maximum acceptable rate of “Not sure” diagnoses for experienced physicians is 2%

So we might propose the software requirements to be something like this:

*) The maximum acceptable false positive rate for the software's diagnosis of myocarditis when the patient doesn’t actually have it shall be 0.5%
*) The maximum acceptable false negative rate for the software's diagnosis of no myocarditis when the patient does actually have it shall be 0.05%
*) The maximum acceptable rate of “Not sure” diagnoses for the software shall be 2%
*) Given a single CRT scan where any odd number of at least three experienced physicians have conflicting diagnoses, the software’s diagnosis shall be the same as the experienced physician’s majority diagnosis at least 95% of the time

In other words, this is constraining the software to:

A) Perform at least well as experienced physicians, and
B) Exhibit a minimum level of agreement with the majority of experienced physicians when those experienced physicians would disagree on the same CRT scan


“How would you go about validating the software against the requirements specification?”

Do you really mean “verifying the software against the requirements specification”? If so, then I would suggest that you need to start with a sufficiently large set of test case CRT scans which have accompanying diagnoses from experienced physicians. This should include multiple CRT scans in each of the following categories:

*) A sufficient number of experienced physicians all agree that the patient does have myocarditis
*) A sufficient number of experienced physicians all agree that the patient does not have myocarditis
*) A sufficient number of experienced physicians all agree that the patient cannot be properly diagnosed from that CRT scan
*) A simple majority of a sufficient number of experienced physicians, but not all of them, agree that the patient does have myocarditis
*) A simple majority of a sufficient number of experienced physicians, but not all of them, agree that the patient does not have myocarditis
*) A simple majority of a sufficient number of experienced physicians, but not all of them, agree that the patient cannot be properly diagnosed from that CRT scan

Further, not a single one of the above test case CRT scans above are allowed to also be in the data set that was used to train the DLNN.

There is necessarily a lot of statistics that is well beyond my pay grade to determine what is a “sufficiently large set of CRT scans”, how many CRT scans are needed in each of the categories, and what “a sufficient number of experienced physicians” is. But I expect we should be able to get statistically significant verification that the software is performing at or above each of the required Quality of Service levels.



Cheers,

— steve



-----Original Message-----
From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de<mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de>> on behalf of Peter Bernard Ladkin <ladkin at causalis.com<mailto:ladkin at causalis.com>>
Organization: RVS Bielefeld and Causalis
Date: Wednesday, April 28, 2021 at 10:24 AM
To: "systemsafety at lists.techfak.uni-bielefeld.de<mailto:systemsafety at lists.techfak.uni-bielefeld.de>" <systemsafety at lists.techfak.uni-bielefeld.de<mailto:systemsafety at lists.techfak.uni-bielefeld.de>>
Subject: Re: [SystemSafety] [External] Re: Post Office Horizon System

Steve,

I am certainly not going to suggest that many software functional requirements could be more
carefully specified than they are, and, like you, I believe that in many software developments such
precise requirements specification can help enormously.

Suppose you were writing software to look at CRT scans of people's hearts, and identify myocarditis.
There is (a) a software component which maps pixels to anatomical objects
+ geometry, followed by
(b) a software interpretive component which identifies certain kinds of anomalies in the picture
overlaid with the anatomy derived from (a).

There are two related criteria for the success of this software. The main
criterion is that the
subject really does have myocarditis. The second criterion is that the software judgement agrees
with the judgement of an experienced physician that the subject has myocarditis. Most often, it is
the second criterion which is used to determine success, since the first can only be determined with
invasive and medically undesirable procedures.

Tasks (a) and (b) are usually undertaken a DLNN. How would you go about writing a requirements
specification for this software? How would you go about validating the software against the
requirements specification?

PBL

Prof. Peter Bernard Ladkin, Bielefeld, Germany
ClaireTheWhiteRabbit RIP
Tel+msg +49 (0)521 880 7319  www.rvs-bi.de






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20210430/6e0ce788/attachment.html>


More information about the systemsafety mailing list