[SystemSafety] Technical information on Airbus A320 recall?

Mon Dec 1 11:18:34 CET 2025

While Single Event Upsets (SEUs) originate in hardware, they can be
mitigated through hardware design, software design and/or system design.
For example, when I worked on the Airbus A380 Landing Gear Extension and
Retraction System (LGERS), the main mitigation against SEUs was the fact
that LGERS is a multiple redundant system, meaning that an SEU in one
channel will not affect the other channels. However, we were also required
to design the software so that we kept three copies of critical data (one
of the copies was the one's complement of the other two). This meant that
we were able to detect an SEU corrupting one of the copies and restore the
correct value from one of the other two copies. We recorded the number of
data corruptions in NVRAM. It would be interesting to know how often SEUs
have occurred in practice - I expect that SEUs occur quite often,
especially at altitude.

Both the FAA and EASA have required avionic systems to mitigate against
SEUs for a long time. EASA Cert Memo CM-AS-004 defines the certification
considerations concerning Single Event Effects (SEE). RTCA DO-248C/EUROCAE
ED-94C DP #21 provides clarification on SEU as it relates to software. DP
#21 suggests protection mechanisms such as parity, cyclic redundancy codes,
Hamming codes and storing triple versions of critical data.

My understanding is that the vulnerability that resulted in the
in-flight upset was introduced in version L104 and that it can be avoided
by reverting to version L103+. I'm guessing that the software developer
introduced some new functionality in L104 but failed to protect critical
data required to implement the new functionality. The fact that L104
resulted in an-flight upset soon after it was introduced suggests that SEUs
are relatively common and are usually mitigated by hardware, software
and/or system design. I expect that the supplier is developing a new
software version that will re-introduce the new functionality but protect
critical data against SEUs.

Yours,

Dewi Daniels | Director | Software Safety Limited

Telephone +44 7968 837742 | Email dewi.daniels at software-safety.com

Software Safety Limited is a company registered in England and Wales.
Company number: 9390590. Registered office: Fairfield, 30F Bratton Road,
West Ashton, Trowbridge, United Kingdom BA14 6AZ

On Mon, 1 Dec 2025 at 08:47, Prof. Dr. Peter Bernard Ladkin <
ladkin at techfak.de> wrote:

> Les,
>
> not being a computer scientist you may not be aware that fault tolerance
> has for many decades been a
> major theme in computer science. The IEEE International Symposium on
> Fault-Tolerant Computing (FTCS)
> started in 1971, 54 years ago. There is IFIP Working Group 10.4 on
> Dependability and Fault Tolerance
> which has been running, as far as I know, for about as long. Jean-Calude
> Laprie was Chair for many
> years. Brian Randell and colleagues at Newcastle University established
> the computer science
> department there, I believe the first in the UK, as a major centre for
> research into fault tolerance
> (you may have heard of "recovery blocks"?). Brian and Tom Anderson were
> members of IFIP WG 10.4 for
> many, many years (maybe still are?). IEEE has a Technical Committee on
> Depandability and Fault
> Tolerance, but its "flagship" conference is now DSN rather than FTCS.
>
> IFIP WG 10.4 was, I believe, the first organisation to understand that
> dependability of digital
> systems meant rather more than just reliability. Their first terminology
> was published in 1992 in
> five languages by Springer Verlag. Safety and (what was then called)
> security (which I now prefer to
> call cybersecurity) were considered by them to be dependability
> attributes, for very good reason.
>
> The IEC, by contrast, consides neither safety nor cybersecurity part of
> dependability. TC 56 is
> Dependability. Digital-system safety in the IEC resides with SC 65A, the
> "Safety Aspects"
> subcommittee (used to be the "System Aspects" SC) of TC 65,
> Industrial-process control, measurement
> and automation. Industrial-process cybersecurity resides in TC 65,
> although there is a movement to
> make their cybersecurity standards more widely applicable (called a
> "horizontal" function), as SC
> 65A's safety standard IEC 61508 is (many of us are sceptical about this
> move).
>
> So there is a fair amount of silo-ing in the international organisations
> trying to
> define/capture/explicate the state of the art in digital systems
> dependability, even just in the
> computer-science area.
>
> There are a couple of sources of faults/failures that "come out of
> nowhere", which weren't paid so
> much attention by FT types 30 years ago, but have in the succeeding period
> increased substantially
> in importance. SEEs are one. People dealing with spacecraft routinely
> protect against SEEs that may
> be caused by alpha particles. Protecting against alpha particles is
> relatively easy compared with
> protecting against the derivates which occur when these alpha particles
> interact with the earth's
> atmosphere (called cosmic rays). Then there are Byzantine faults and
> failures. Algorithmically
> resolving Byzantine failures deterministically is known (from Lamport's
> first paper on the subject)
> to be computationally expensive, but there are some network architectures
> that mitigate their
> occurrence (Kevin Driscoll, who is on this list, is the foremost expert on
> occurrences "in the wild"),
>
> On 2025-11-30 22:54 , Les Chambers wrote:
> > ... I'm surprised
> > that this could happen in aviation, which is typically the gold standard
> in
> > Safety-Critical systems design.
>
> And fault-tolerant digital design. The circumstance that is flabbergasting
> everyone is, I think,
> that they got it right, developed the system further, and got it wrong
> (whoever "they" is). That is
> usually not the way industrial progress works. (Thales, the manufacturer
> of the ELAC, apparently
> told Reuters that "the functionality in question is supported by software
> that is not under Thales'
> responsibility".
>
> https://www.reuters.com/business/aerospace-defense/airbus-a320-repairs-must-be-before-next-flight-bulletin-shows-2025-11-28/
> )
>
> A few more details on the incident: "JetBlue Flight 1230, operating from
> Cancún International
> Airport (CUN) to Newark Liberty International Airport (EWR), experienced
> an uncommanded drop in
> altitude approximately one hour after departure. The aircraft, registered
> N605JB , rapidly lost
> about 14,500 feet in five minutes, followed by another 12,200 feet in the
> next five minutes. The
> crew diverted to Tampa International Airport (TPA) and landed at
> approximately 1420 local time."
> from https://avgeekery.com/airbus-a320-emergency-airworthiness-directive/
> I have no experience with
> this site and thus don't know how reliable this account can be presumed to
> be. But that must have
> been pretty harrowing for CRW -- the incident played out over ten minutes
> and they apparently
> weren't able to counter.
>
> PPRuNe probably has a lot more, but this weekend (and into today) I just
> couldn't face the high
> noise-to-signal ratio.
>
> PBL
>
> Prof. i.R. Dr. Peter Bernard Ladkin, Bielefeld, Germany
> www.rvs-bi.de
>
>
>
>
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
> Manage your subscription:
> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20251201/e418295f/attachment-0001.html>