[SystemSafety] When malloc() Never Returns NULL

Pekka Pihlajasaari pekka at data.co.za
Tue May 23 14:28:15 CEST 2023


Derek

While this is clearly an example of a safety critical failure in a software system, blaming the software when the limitation had been removed 13 years previously suggests a limited understanding on the expected life of software. The Microsoft Office support policy comments:
	Using versions of Office that are no longer supported, or using Office on unsupported operating systems, may cause performance and reliability issues over time.

The first version of Excel, in Office 2007, to support the new file formats with 2^20 row big sheets ended extended support on 10 Oct 2017. Newspaper reports referenced in the paper question the file size, but even they assume the 1M row limit is the issue and only mention the older 64k limit in passing.

Given Test and Trace tooling was developed as part of a GBP 208M project in response to the pandemic, it suggests that a lack of governance, not software, is at the root of the failure. Was it too much to ask to incorporate conservative requirements that exceeded the number of patients documented in the most recent similar pandemic in 1918?

The tragedy is regrettable but is more an organisational concern than one attributable to a limitation in the supporting software. The paper, in contrast, is a good example of analysis normally only carried out in aircraft accident investigations where negative outcomes form the basis for understanding.

Should this have been developed using a hardened OS with formal methods, or would some adult supervision during requirements capture have sufficed? Realistically, regular tools are frequently used in safety critical applications with elevated risk and when negative consequences do not occur, the decision maker is frequently praised for their frugality.

Regards
Pekka Pihlajasaari
--
pekka at data.co.za	Data Abstraction (Pty) Ltd	+27 11 484 9664
--
https://learn.microsoft.com/en-us/lifecycle/products/microsoft-office-excel-2007
https://en.wikipedia.org/wiki/Microsoft_Excel
https://www.theguardian.com/politics/2020/oct/05/how-excel-may-have-caused-loss-of-16000-covid-tests-in-england


-----Original Message-----
From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de> On Behalf Of Derek M Jones
Sent: Tuesday, May 23, 2023 11:28 AM
To: systemsafety at lists.techfak.uni-bielefeld.de
Subject: Re: [SystemSafety] When malloc() Never Returns NULL

David,

A more recent example.
Using a spreadsheet with a 64K limit on the number of rows is not a good idea when dealing with a potential sample size in the 100K+ https://warwick.ac.uk/fac/soc/economics/research/centres/cage/publications/workingpapers/2020/does_contact_tracing_work_quasi_experimental_evidence_from_an_excel_error_in_england/

> The USS Yorktown is a good example, thank you for the reminder. I agree that one can underestimate the criticality of some applications.
> 
> Best regards,
> David Mentré
> 
>> Le 11 mai 2023 à 20:58, Derek M Jones <derek at knosof.co.uk> a écrit :
>>
>> David,
>>
>>> How is it relevant to System Safety (topic of this list)? This paper 
>>> is interesting but as far as I know, safety critical programs are 
>>> not executed on generic OS mentioned by this paper but real-time OS 
>>> or bare metal. Moreover, such programs
>>
>> The divide-by zero error on the USS Yorktown springs to mind
>> https://medium.com/dataseries/when-smart-ships-divide-by-zer0-uss-yor
>> ktown-4e53837f75b2
>>
>>> would never do dynamic memory allocation or only at program startup. In my view, the recommendations of this paper (in particular using x family functions that assume allocation always succeed or terminate the application) are not valid in safety critical context: the handling of memory allocation failure should be considered and handled properly.
>>
>> I continue to be surprised by the complexity of hardware/software 
>> being used in safety related applications.
>>
>> Every now and again something considered as not safety critical 
>> fails, and an unnoticed dependency suddenly appears.
>>
>> I know of (not used in safety critical, as far as I know) programs 
>> that handle malloc returning NULL by sensibly closing things down.  
>> The idea that malloc failing might result in the cleanup never 
>> occurring, because the process is killed by the OS, is something relatively new.
>>
>>> Best regards,
>>> David Mentré
>>>>> Le 11 mai 2023 à 12:58, Derek M Jones <derek at knosof.co.uk> a écrit :
>>>>
>>>> All,
>>>>
>>>> Coding guidelines have been telling developers to check the return 
>>>> value of malloc forever.
>>>>
>>>> It certainly used to make a difference, but it looks as if 
>>>> out-of-memory is becoming a thing of the past, at least on the 
>>>> desktop.
>>>>
>>>> "When malloc() Never Returns NULL -- Reliability as an Illusion"
>>>> https://arxiv.org/abs/2208.08484
>>>>
>>>> -- 
>>>> Derek M. Jones           Evidence-based software engineering
>>>> blog:https://shape-of-code.com 



More information about the systemsafety mailing list