[SystemSafety] Technical information on Airbus A320 recall?

Prof. Dr. Peter Bernard Ladkin ladkin at techfak.de
Mon Dec 1 09:45:44 CET 2025


Les,

not being a computer scientist you may not be aware that fault tolerance has for many decades been a 
major theme in computer science. The IEEE International Symposium on Fault-Tolerant Computing (FTCS) 
started in 1971, 54 years ago. There is IFIP Working Group 10.4 on Dependability and Fault Tolerance 
which has been running, as far as I know, for about as long. Jean-Calude Laprie was Chair for many 
years. Brian Randell and colleagues at Newcastle University established the computer science 
department there, I believe the first in the UK, as a major centre for research into fault tolerance 
(you may have heard of "recovery blocks"?). Brian and Tom Anderson were members of IFIP WG 10.4 for 
many, many years (maybe still are?). IEEE has a Technical Committee on Depandability and Fault 
Tolerance, but its "flagship" conference is now DSN rather than FTCS.

IFIP WG 10.4 was, I believe, the first organisation to understand that dependability of digital 
systems meant rather more than just reliability. Their first terminology was published in 1992 in 
five languages by Springer Verlag. Safety and (what was then called) security (which I now prefer to 
call cybersecurity) were considered by them to be dependability attributes, for very good reason.

The IEC, by contrast, consides neither safety nor cybersecurity part of dependability. TC 56 is 
Dependability. Digital-system safety in the IEC resides with SC 65A, the "Safety Aspects" 
subcommittee (used to be the "System Aspects" SC) of TC 65, Industrial-process control, measurement 
and automation. Industrial-process cybersecurity resides in TC 65, although there is a movement to 
make their cybersecurity standards more widely applicable (called a "horizontal" function), as SC 
65A's safety standard IEC 61508 is (many of us are sceptical about this move).

So there is a fair amount of silo-ing in the international organisations trying to 
define/capture/explicate the state of the art in digital systems dependability, even just in the 
computer-science area.

There are a couple of sources of faults/failures that "come out of nowhere", which weren't paid so 
much attention by FT types 30 years ago, but have in the succeeding period increased substantially 
in importance. SEEs are one. People dealing with spacecraft routinely protect against SEEs that may 
be caused by alpha particles. Protecting against alpha particles is relatively easy compared with 
protecting against the derivates which occur when these alpha particles interact with the earth's 
atmosphere (called cosmic rays). Then there are Byzantine faults and failures. Algorithmically 
resolving Byzantine failures deterministically is known (from Lamport's first paper on the subject) 
to be computationally expensive, but there are some network architectures that mitigate their 
occurrence (Kevin Driscoll, who is on this list, is the foremost expert on occurrences "in the wild"),

On 2025-11-30 22:54 , Les Chambers wrote:
> ... I'm surprised
> that this could happen in aviation, which is typically the gold standard in
> Safety-Critical systems design.

And fault-tolerant digital design. The circumstance that is flabbergasting everyone is, I think, 
that they got it right, developed the system further, and got it wrong (whoever "they" is). That is 
usually not the way industrial progress works. (Thales, the manufacturer of the ELAC, apparently 
told Reuters that "the functionality in question is supported by software that is not under Thales' 
responsibility". 
https://www.reuters.com/business/aerospace-defense/airbus-a320-repairs-must-be-before-next-flight-bulletin-shows-2025-11-28/ 
)

A few more details on the incident: "JetBlue Flight 1230, operating from Cancún International 
Airport (CUN) to Newark Liberty International Airport (EWR), experienced an uncommanded drop in 
altitude approximately one hour after departure. The aircraft, registered N605JB , rapidly lost 
about 14,500 feet in five minutes, followed by another 12,200 feet in the next five minutes. The 
crew diverted to Tampa International Airport (TPA) and landed at approximately 1420 local time." 
from https://avgeekery.com/airbus-a320-emergency-airworthiness-directive/ I have no experience with 
this site and thus don't know how reliable this account can be presumed to be. But that must have 
been pretty harrowing for CRW -- the incident played out over ten minutes and they apparently 
weren't able to counter.

PPRuNe probably has a lot more, but this weekend (and into today) I just couldn't face the high 
noise-to-signal ratio.

PBL

Prof. i.R. Dr. Peter Bernard Ladkin, Bielefeld, Germany
www.rvs-bi.de






More information about the systemsafety mailing list