[SystemSafety] [System Safety] FOSDEM talk by Paul Sherwood
Paul Sherwood
paul.sherwood at codethink.co.uk
Tue Feb 11 11:43:38 CET 2025
Phil,
Thank you for sharing your thinking, please see comments inline below
On 2025-02-10 19:09, Phil Koopman wrote:
> Hot take: felt more like a sales pitch than a really concrete
> explanation.
I confess, I've done quite a lot of sales, and I feel that this topic
needs some selling, given the state of things.
> Old ways dismissed based more on "nobody
> wants to do it" and "new is better". I get why that argument has
> appeal, and surely some (not all) people don't do the old ways as well
> as they should.
Agreed, but as I said in the talk, the key point is that "we are not in
Kansas anymore". The "old ways" served for microcontrollers, but some of
them do not scale to the multicore microprocessor world we are now in.
> But it was pretty light on why "new will be sufficient
> for acceptable safety" vs. deciding that doing the new ways really
> well will automatically be fine.
My understanding (I may be wrong) is that the accepted approach to "what
is sufficient" is something along the lines of "we put suitably
qualified+experienced people on the job and they tell us when they are
finished".
That does not satisfy me (nor you, I expect) when we consider some of
the projects folks are trying to deliver now (e.g. autonomous vehicles +
SDV), where I would argue that there are very few people that are
suitably qualified and experienced.
A key innovation in the TSF approach is that we are expressly
identifying things to measure, and then expecting folks to use the
**measurements** to inform decisions about what is sufficient.
> I just recently saw a live talk
> that had a lot of the same arguments and was similarly left with more
> questions than answers on this topic. Maybe those in the trenches on
> the new technology have a different viewpoint. To be sure, this is not
> me trying to die on the hill of the old ways, but rather expressing
> what I think is appropriate caution on jumping to new system design
> approaches on life critical systems.
I confirm that the view from the trenches is different, but absolutely
agree with the need for caution, hence I appreciate your comments. So
far I've been on this journey since 2016, and am doing my best to engage
with experts, and to ensure that we properly understand and manage the
risks.
> We can justify the old way in hindsight in that it seems to work, even
> if we struggle to rigorously explain why. Do we want to jump to a new
> way without understanding why it is expected to work and spend decades
> of mishaps climbing the hill to getting to where it really needs to
> be? Or is there a way to have some confidence about it before then?
That is exactly what we are working on. First off we've had to
categorically establish that software in these new systems exhibits
random behaviour, and then show that we could apply statistical
techniques to model failure rates with confidence intervals. This
clearly requires a "new way" since we find one of the fundamental
assumptions of ISO 26262 to be invalid for our target systems.
> Assuming they make good on everything they plan, what would help me a
> lot is understanding the basis for the safety case for their approach
> in a general sense. What parts of assurance does it provide, and what
> parts does it not provide? What underlying assumptions does it make?
For the moment we are focusing on the overall need for establishing
trust (i.e. safety is a subset of our goal), which leads us to start
with the need for
a) evidence-based approach
b) measurement of confidence
For safety specifically, we are mapping our argument for compliance with
IEC 61508 using the Trustable Software Framework. I'm unable to share
more details in public at this stage, but see my comment below re your
2022 paper.
> As a simple example, supply chain attacks will be a huge issue
> compared to a proprietary OS + custom application. This is not news to
> them, but is an example of the type of new challenge that might
> surprise us with an mishap news headline.
While I agree that supply chain attacks are a huge issue (and to be
clear three out of the six Trustable Tenets focus on this topic) I think
you should be aware that this applies equally to systems running
proprietary software. Most so-called proprietary systems these days
involve a great deal of open source **anyway**, but this may be
conveniently avoided (either accidentally or on purpose) during safety
analysis.
Further, I believe there is some consensus (outside the safety
community) that actively maintained open source tends to be more secure
than proprietary software.
The "proprietaty OS + custom application" may appear at first to be more
secure because there is less visibility of exploits, but this is really
just "security by obscurity" which according to wikipedia is
"discouraged and not recommended by standards bodies".
> Discussion here is fine, but this is not something we are likely to
> resolve in an e-mail chain. This is a whole discussion that needs to
> happen over many years in many forums.
Totally agree - but we have made a start :)
And actually discussions here have already helped, since I joined the
list in 2018.
> And it is not just about FOSS
> in general. There are these concerns plus additional specific concerns
> about using machine learning technology, tool chains and libraries as
> well, in which we already have multi-ton machines hurtling down public
> roads by companies who have decided (most, but not all of them) that
> core safety standards -- including ones specifically for their
> technology -- are irrelevant because they stifle innovation.
I agree with your concerns, but I have not said (and do not believe)
that standards are irrelevant. The current business model for standards
is broken imo, and the lack of general visibility discourages both
adoption and improvement.
> We might not be happy with old-school safety practices, but there are
> lessons there learned the hard way over decades. We should be
> reluctant to throw them out wholesale without taking some time to
> figure out what lessons we need to learn with the new approaches.
Again I agree. I myself have some decades of lessons learned (e.g. I
wrote the first version of "The Software Commandments" in 1996 [1]) and
I'm still doing my best to learn.
> My own take on this topic in a somewhat more abstract discussion is
> here, in which I point out the gaps in understanding how/why current
> approaches work and how we might close those gaps going forward for
> any approach:
>
> * Johansson, R. & Koopman, P., "Continuous Learning Approach to
> Safety Engineering [2]," Critical Automotive Applications: Robustness
> & Safety / CARS at EDCC2022.
Thanks for sharing this paper. I agree with most of the recommendations
therein, and believe that we are actually implementing them with our
customers. In fact I would suggest that applying TSF should explicitly
guide practitioners along the path you describe (including SPIs, ongoing
data collection, DevOps-style continuous learning approach to lifecycle
safety etc.)
br
Paul
[2] https://www.codethink.co.uk/commandments.html
More information about the systemsafety
mailing list