[SystemSafety] Technical information on Airbus A320 recall?

Mon Dec 1 11:30:01 CET 2025

Thanks for all of that history, Peter... really interesting!

This list may be interested to note that ISO/IEC JTC1/SC7 Software & Systems Engineering has recently established Working Group 30 on System Resilience, to try and de-silo some of the considerations you highlight!

Andrew

-----Original Message-----
From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de> On Behalf Of Prof. Dr. Peter Bernard Ladkin
Sent: 01 December 2025 08:46
To: systemsafety at lists.techfak.uni-bielefeld.de
Subject: Re: [SystemSafety] Technical information on Airbus A320 recall?

Les,

not being a computer scientist you may not be aware that fault tolerance has for many decades been a major theme in computer science. The IEEE International Symposium on Fault-Tolerant Computing (FTCS) started in 1971, 54 years ago. There is IFIP Working Group 10.4 on Dependability and Fault Tolerance which has been running, as far as I know, for about as long. Jean-Calude Laprie was Chair for many years. Brian Randell and colleagues at Newcastle University established the computer science department there, I believe the first in the UK, as a major centre for research into fault tolerance (you may have heard of "recovery blocks"?). Brian and Tom Anderson were members of IFIP WG 10.4 for many, many years (maybe still are?). IEEE has a Technical Committee on Depandability and Fault Tolerance, but its "flagship" conference is now DSN rather than FTCS.

IFIP WG 10.4 was, I believe, the first organisation to understand that dependability of digital systems meant rather more than just reliability. Their first terminology was published in 1992 in five languages by Springer Verlag. Safety and (what was then called) security (which I now prefer to call cybersecurity) were considered by them to be dependability attributes, for very good reason.

The IEC, by contrast, consides neither safety nor cybersecurity part of dependability. TC 56 is Dependability. Digital-system safety in the IEC resides with SC 65A, the "Safety Aspects" 
subcommittee (used to be the "System Aspects" SC) of TC 65, Industrial-process control, measurement and automation. Industrial-process cybersecurity resides in TC 65, although there is a movement to make their cybersecurity standards more widely applicable (called a "horizontal" function), as SC 65A's safety standard IEC 61508 is (many of us are sceptical about this move).

So there is a fair amount of silo-ing in the international organisations trying to define/capture/explicate the state of the art in digital systems dependability, even just in the computer-science area.

There are a couple of sources of faults/failures that "come out of nowhere", which weren't paid so much attention by FT types 30 years ago, but have in the succeeding period increased substantially in importance. SEEs are one. People dealing with spacecraft routinely protect against SEEs that may be caused by alpha particles. Protecting against alpha particles is relatively easy compared with protecting against the derivates which occur when these alpha particles interact with the earth's atmosphere (called cosmic rays). Then there are Byzantine faults and failures. Algorithmically resolving Byzantine failures deterministically is known (from Lamport's first paper on the subject) to be computationally expensive, but there are some network architectures that mitigate their occurrence (Kevin Driscoll, who is on this list, is the foremost expert on occurrences "in the wild"),

On 2025-11-30 22:54 , Les Chambers wrote:
> ... I'm surprised
> that this could happen in aviation, which is typically the gold 
> standard in Safety-Critical systems design.

And fault-tolerant digital design. The circumstance that is flabbergasting everyone is, I think, that they got it right, developed the system further, and got it wrong (whoever "they" is). That is usually not the way industrial progress works. (Thales, the manufacturer of the ELAC, apparently told Reuters that "the functionality in question is supported by software that is not under Thales' 
responsibility". 
https://www.reuters.com/business/aerospace-defense/airbus-a320-repairs-must-be-before-next-flight-bulletin-shows-2025-11-28/
)

A few more details on the incident: "JetBlue Flight 1230, operating from Cancún International Airport (CUN) to Newark Liberty International Airport (EWR), experienced an uncommanded drop in altitude approximately one hour after departure. The aircraft, registered N605JB , rapidly lost about 14,500 feet in five minutes, followed by another 12,200 feet in the next five minutes. The crew diverted to Tampa International Airport (TPA) and landed at approximately 1420 local time." 
from https://avgeekery.com/airbus-a320-emergency-airworthiness-directive/ I have no experience with this site and thus don't know how reliable this account can be presumed to be. But that must have been pretty harrowing for CRW -- the incident played out over ten minutes and they apparently weren't able to counter.

PPRuNe probably has a lot more, but this weekend (and into today) I just couldn't face the high noise-to-signal ratio.

PBL

Prof. i.R. Dr. Peter Bernard Ladkin, Bielefeld, Germany www.rvs-bi.de

_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE
Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety