[SystemSafety] NTSB report on Boeing 787 APU battery fire at Boston Logan

Peter Bernard Ladkin ladkin at rvs.uni-bielefeld.de
Thu Dec 4 14:42:39 CET 2014


Has been published at http://www.ntsb.gov/doclib/reports/2014/AIR1401.pdf

There was an NYT article yesterday:
http://www.nytimes.com/2014/12/02/business/report-on-boeing-787-dreamliner-batteries-assigns-some-blame-for-flaws.html


Just the summary of the NTSB report is astonishing in itself! Keep in mind this is a
2.2kW-hr energy storage device (75 Amp-hours at nominal 29.6 V). When they decide to go, they can
get rid of all that energy in a relatively short space of time. But apparently the regulator didn't
think so. Before.

Pp viii-ix of the Report contains a summary of the conclusions. It makes clear an astonishingly
superficial grasp of the technology on the part of Boeing and the FAA. The manufacturer's
processes allowed FOD and improper cell winding without having effective detection methods in place.

My remarks are in square parentheses

[begin quote NTSB]

The NTSB identified the following safety issues as a result of this incident investigation:

* Cell internal short circuiting and the potential for thermal runaway of one or more battery
cells, fire, explosion, and flammable electrolyte release. This incident involved an
uncontrollable increase in temperature and pressure (thermal runaway) of a single APU battery cell
as a result of an internal short circuit and the cascading thermal runaway of the other seven
cells within the battery. This type of failure was not expected .....

[Huh? How could one not anticipate internal short circuits? How could one not anticipate thermal
runaway from an internal short circuit? Answer: this was an assumption derived from a single
nail-penetration test.]

..... Boeing’s analysis of the main and APU battery did not consider the possibility that
cascading thermal runaway of the battery could occur as a result of a cell internal short circuit.

[This is an astonishing assertion, but appears to be well justified in the NTSB analysis. How do
you miss such an obvious phenomenon? I guess it's an example of group think. The NTSB notes
the lack of effective traceability in the system safety assessment, pp73ff.]

* Cell manufacturing defects and oversight of cell manufacturing processes. After the incident,
the NTSB visited GS Yuasa’s production facility ... NTSB identified several concerns, including
foreign object debris (FOD) generation during cell welding operations and a postassembly
inspection process that could not reliably detect manufacturing defects, such as FOD and
perturbations (wrinkles) in the cell windings, which could lead to internal short circuiting. In
addition, the FAA’s oversight of Boeing, Boeing’s oversight of Thales, and Thales’ oversight of GS
Yuasa did not ensure that the cell manufacturing process was consistent with established industry
practices.

[That is, the manufacturer of extremely powerful Li-ion secondary batteries was not using
"established industry practices". Not only that, but its quality-control processes were flawed.
And this after all those public claims about careful oversight. ]

* Thermal management of large-format lithium-ion batteries. Testing performed during the
investigation showed that localized heat generated inside a 787 main and APU battery during
maximum current discharging exposed a cell to high-temperature conditions. Such conditions could
lead to an internal short circuit and cell thermal runaway. As a result, thermal protections
incorporated in large-format lithium-ion battery designs need to account for all sources of
heating in the battery during the most extreme charge and discharge current conditions.

[Well, yes. That is or should be routine safety analysis and factor mitigation. But both the
manufacturer's FMEA and the FAA requirements of the system safety assessment seem to have been
lacking; see below.]

* Insufficient guidance for manufacturers to use in determining and justifying key assumptions in
safety assessments. Boeing’s EPS safety assessment for the 787 main and APU battery included an
underlying assumption that the effect of an internal short circuit within a cell would be limited
to venting of only that cell without fire. However, the assessment did not explicitly discuss this
key assumption or provide the engineering rationale and justifications to support the assumption.
.....Boeing’s assumption was incorrect.....
..... Boeing and FAA reviews of the EPS safety assessment did not reveal that the assessment had
not (1) considered the most severe effects of a cell internal short circuit and (2) included
requirements to mitigate related risks.

* Insufficient guidance for FAA certification engineers to use during the type certification
process to ensure compliance with applicable requirements. During the 787 certification process,
the FAA did not recognize that cascading thermal runaway of the battery could occur as a result of
a cell internal short circuit.

[This is *really* hard to fathom!]

[end quote]

The manufacturing-line defects were quite straightforward. It astonishes me that the NTSB was able
to observe "perturbations" in electrode/separator strips being wound during their inspection - and
that such as these were not discovered using the manufacturer's quality-control (CT of the results
which was too coarse to detect the kind of FOD that might well have got in, or even the
perturbations the NTSB found), because these things don't appear to be subtle.

The manufacturer's FMEA was apparently based upon in-service data of 14,000 cells of a similar
design to the LVP65.

On p68 we read:

[begin quote]

Boeing and Thales performed preliminary and final EPS safety assessments, which included fault
tree analyses, FMEAs, and failure rate data provided by GS Yuasa. These assessments considered
internal short circuit failures but were developed with the underlying assumption that the most
severe effect of an internal short circuit within a cell would be limited to venting of only that
cell without fire and propagation to other cells. Thus, the potential for an internal short
circuit to lead to multiple-cell or battery thermal runaway with venting, electrolyte leakage,
excessive heat, and fire was not analyzed in the safety assessment.

[end quote]

In other words, the FMEA contained an inadequate "E" part - internal short circuits leading to
thermal runaway apparently didn't occur as an effect of a failure. Why not?

The FMEA is talked about on pp49-51, in Section 1.7.3 System Safety Assessment:

[begin quote]

Boeing’s FMEA was based on information contained within GS Yuasa’s FMEA, which GS Yuasa developed
with assistance from Boeing and Thales. GS Yuasa’s FMEA included a calculation of a representative
failure rate for the LVP65 cell. This calculation was based on in-service data from about 14,000
existing large-scale industrial lithium-ion cells manufactured by GS Yuasa, which had a similar
design and manufacturing process as the LVP65 cell.106 GS Yuasa’s FMEA indicated that none of the
industrial cells had experienced any failures, including venting, electrolyte release, or rupture
of a vent disc. (GS Yuasa’s FMEA did not include an analysis of usage and environmental
similarities between the industrial cells and the LVP65 cells or a discussion of the hazardous
effects of a lithium-ion cell failure, including overheating or venting.)

[end quote]

So they did an FMEA using data from cells, none of which had failed. Looks good so far! Perfect
manufacturing! But then there was a Nov 2006 fire at Securaplane, which makes the battery charging
system (BCS). Investigation put this down to an cell-internal short, and overcharging of at least
one other cell (Note 81, p43). There was also a thermal runaway in an APU battery on July 7, 2009
(Note 82, p43). Both of these incidents vitiate the assumptions made in the system safety
assessment that thermal runaway was not a possible effect, but apparently nobody at Boeing or the
FAA noticed. In other words, the SSA was not revisited as a result of these two incidents.

Yes, a lack of joined-up thinking. In some sense, this was known to be a problem with the heavily
outsourced/subcontracted 787 project - one might even guess that "ensuring joined-up thinking" is
THE big challenge with such efforts. Recall the cable-bundle mismatching that occurred on the
A380, which if I remember rightly was partly put down to different Airbus plants using different
versions of the CAD tool CATIA. But this lack of joined-up thinking went beyond the manufacturer
(on this project more a systems integrator) to include in the 787 case the regulator as well!

A significant piece of information concerning aircraft safety assessment is contained in Note 86, p44:

[begin quote]

The FAA did not consider the 787 battery to be a critical component because the Seattle Aircraft
Certification Office (which was responsible for the airplane’s certification) regarded the battery
as a redundant system. ......

[end quote]

You are only "critical" according to airworthiness regulations if you're a single point of failure,
and you only get selected for top scrutiny if you are manufacturing a "critical" component. There is
an obvious argument here for a notion of criticality referring to the severity of consequences of
(faulty or otherwise) behavior.

In any case, that won't help if the FMEA/FHA is faulty and doesn't indicate any effect greater than
a single smoky cell.

Once again, it seems that faulty safety assessment, in this case (again) an obviously inadequate
FMEA played a significant role, despite the presence of incidents contradicting the analysis.

(There are people here who have heard me say enough times that I haven't seen an FMEA I can't fault.
There are plenty of other people on this list who can likely that also. Now it's the NTSB's turn to
say it, even if discreetly.)

PBL


Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de






More information about the systemsafety mailing list