[SystemSafety] Technical information on Airbus A320 recall?
Prof. Dr. Peter Bernard Ladkin
ladkin at techfak.de
Mon Dec 1 09:45:44 CET 2025
Les,
not being a computer scientist you may not be aware that fault tolerance has for many decades been a
major theme in computer science. The IEEE International Symposium on Fault-Tolerant Computing (FTCS)
started in 1971, 54 years ago. There is IFIP Working Group 10.4 on Dependability and Fault Tolerance
which has been running, as far as I know, for about as long. Jean-Calude Laprie was Chair for many
years. Brian Randell and colleagues at Newcastle University established the computer science
department there, I believe the first in the UK, as a major centre for research into fault tolerance
(you may have heard of "recovery blocks"?). Brian and Tom Anderson were members of IFIP WG 10.4 for
many, many years (maybe still are?). IEEE has a Technical Committee on Depandability and Fault
Tolerance, but its "flagship" conference is now DSN rather than FTCS.
IFIP WG 10.4 was, I believe, the first organisation to understand that dependability of digital
systems meant rather more than just reliability. Their first terminology was published in 1992 in
five languages by Springer Verlag. Safety and (what was then called) security (which I now prefer to
call cybersecurity) were considered by them to be dependability attributes, for very good reason.
The IEC, by contrast, consides neither safety nor cybersecurity part of dependability. TC 56 is
Dependability. Digital-system safety in the IEC resides with SC 65A, the "Safety Aspects"
subcommittee (used to be the "System Aspects" SC) of TC 65, Industrial-process control, measurement
and automation. Industrial-process cybersecurity resides in TC 65, although there is a movement to
make their cybersecurity standards more widely applicable (called a "horizontal" function), as SC
65A's safety standard IEC 61508 is (many of us are sceptical about this move).
So there is a fair amount of silo-ing in the international organisations trying to
define/capture/explicate the state of the art in digital systems dependability, even just in the
computer-science area.
There are a couple of sources of faults/failures that "come out of nowhere", which weren't paid so
much attention by FT types 30 years ago, but have in the succeeding period increased substantially
in importance. SEEs are one. People dealing with spacecraft routinely protect against SEEs that may
be caused by alpha particles. Protecting against alpha particles is relatively easy compared with
protecting against the derivates which occur when these alpha particles interact with the earth's
atmosphere (called cosmic rays). Then there are Byzantine faults and failures. Algorithmically
resolving Byzantine failures deterministically is known (from Lamport's first paper on the subject)
to be computationally expensive, but there are some network architectures that mitigate their
occurrence (Kevin Driscoll, who is on this list, is the foremost expert on occurrences "in the wild"),
On 2025-11-30 22:54 , Les Chambers wrote:
> ... I'm surprised
> that this could happen in aviation, which is typically the gold standard in
> Safety-Critical systems design.
And fault-tolerant digital design. The circumstance that is flabbergasting everyone is, I think,
that they got it right, developed the system further, and got it wrong (whoever "they" is). That is
usually not the way industrial progress works. (Thales, the manufacturer of the ELAC, apparently
told Reuters that "the functionality in question is supported by software that is not under Thales'
responsibility".
https://www.reuters.com/business/aerospace-defense/airbus-a320-repairs-must-be-before-next-flight-bulletin-shows-2025-11-28/
)
A few more details on the incident: "JetBlue Flight 1230, operating from Cancún International
Airport (CUN) to Newark Liberty International Airport (EWR), experienced an uncommanded drop in
altitude approximately one hour after departure. The aircraft, registered N605JB , rapidly lost
about 14,500 feet in five minutes, followed by another 12,200 feet in the next five minutes. The
crew diverted to Tampa International Airport (TPA) and landed at approximately 1420 local time."
from https://avgeekery.com/airbus-a320-emergency-airworthiness-directive/ I have no experience with
this site and thus don't know how reliable this account can be presumed to be. But that must have
been pretty harrowing for CRW -- the incident played out over ten minutes and they apparently
weren't able to counter.
PPRuNe probably has a lot more, but this weekend (and into today) I just couldn't face the high
noise-to-signal ratio.
PBL
Prof. i.R. Dr. Peter Bernard Ladkin, Bielefeld, Germany
www.rvs-bi.de
More information about the systemsafety
mailing list