[SystemSafety] [System Safety] FOSDEM talk by Paul Sherwood

Tue Feb 11 11:43:38 CET 2025

Phil,

Thank you for sharing your thinking, please see comments inline below

On 2025-02-10 19:09, Phil Koopman wrote:
> Hot take: felt more like a sales pitch than a really concrete 
> explanation.

I confess, I've done quite a lot of sales, and I feel that this topic 
needs some selling, given the state of things.

> Old ways dismissed based more on "nobody
> wants to do it" and "new is better".  I get why that argument has
> appeal, and surely some (not all) people don't do the old ways as well
> as they should.

Agreed, but as I said in the talk, the key point is that "we are not in 
Kansas anymore". The "old ways" served for microcontrollers, but some of 
them do not scale to the multicore microprocessor world we are now in.

> But it was pretty light on why "new will be sufficient
> for acceptable safety" vs. deciding that doing the new ways really
> well will automatically be fine.

My understanding (I may be wrong) is that the accepted approach to "what 
is sufficient" is something along the lines of "we put suitably 
qualified+experienced people on the job and they tell us when they are 
finished".

That does not satisfy me (nor you, I expect) when we consider some of 
the projects folks are trying to deliver now (e.g. autonomous vehicles + 
SDV), where I would argue that there are very few people that are 
suitably qualified and experienced.

A key innovation in the TSF approach is that we are expressly 
identifying things to measure, and then expecting folks to use the 
**measurements** to inform decisions about what is sufficient.

> I just recently saw a live talk
> that had a lot of the same arguments and was similarly left with more
> questions than answers on this topic.  Maybe those in the trenches on
> the new technology have a different viewpoint. To be sure, this is not
> me trying to die on the hill of the old ways, but rather expressing
> what I think is appropriate caution on jumping to new system design
> approaches on life critical systems.

I confirm that the view from the trenches is different, but absolutely 
agree with the need for caution, hence I appreciate your comments. So 
far I've been on this journey since 2016, and am doing my best to engage 
with experts, and to ensure that we properly understand and manage the 
risks.

> We can justify the old way in hindsight in that it seems to work, even
> if we struggle to rigorously explain why. Do we want to jump to a new
> way without understanding why it is expected to work and spend decades
> of mishaps climbing the hill to getting to where it really needs to
> be?  Or is there a way to have some confidence about it before then?

That is exactly what we are working on. First off we've had to 
categorically establish that software in these new systems exhibits 
random behaviour, and then show that we could apply statistical 
techniques to model failure rates with confidence intervals. This 
clearly requires a "new way" since we find one of the fundamental 
assumptions of ISO 26262 to be invalid for our target systems.

> Assuming they make good on everything they plan, what would help me a
> lot is understanding the basis for the safety case for their approach
> in a general sense.  What parts of assurance does it provide, and what
> parts does it not provide?  What underlying assumptions does it make?

For the moment we are focusing on the overall need for establishing 
trust (i.e. safety is a subset of our goal), which leads us to start 
with the need for

a) evidence-based approach
b) measurement of confidence

For safety specifically, we are mapping our argument for compliance with 
IEC 61508 using the Trustable Software Framework. I'm unable to share 
more details in public at this stage, but see my comment below re your 
2022 paper.

> As a simple example, supply chain attacks will be a huge issue
> compared to a proprietary OS + custom application. This is not news to
> them, but is an example of the type of new challenge that might
> surprise us with an mishap news headline.

While I agree that supply chain attacks are a huge issue (and to be 
clear three out of the six Trustable Tenets focus on this topic) I think 
you should be aware that this applies equally to systems running 
proprietary software. Most so-called proprietary systems these days 
involve a great deal of open source **anyway**, but this may be 
conveniently avoided (either accidentally or on purpose) during safety 
analysis.

Further, I believe there is some consensus (outside the safety 
community) that actively maintained open source tends to be more secure 
than proprietary software.

The "proprietaty OS + custom application" may appear at first to be more 
secure because there is less visibility of exploits, but this is really 
just "security by obscurity" which according to wikipedia is 
"discouraged and not recommended by standards bodies".

> Discussion here is fine, but this is not something we are likely to
> resolve in an e-mail chain.  This is a whole discussion that needs to
> happen over many years in many forums.

Totally agree - but we have made a start :)

And actually discussions here have already helped, since I joined the 
list in 2018.

> And it is not just about FOSS
> in general. There are these concerns plus additional specific concerns
> about using machine learning technology, tool chains and libraries as
> well, in which we already have multi-ton machines hurtling down public
> roads by companies who have decided (most, but not all of them) that
> core safety standards -- including ones specifically for their
> technology -- are irrelevant because they stifle innovation.

I agree with your concerns, but I have not said (and do not believe) 
that standards are irrelevant. The current business model for standards 
is broken imo, and the lack of general visibility discourages both 
adoption and improvement.

> We might not be happy with old-school safety practices, but there are
> lessons there learned the hard way over decades. We should be
> reluctant to throw them out wholesale without taking some time to
> figure out what lessons we need to learn with the new approaches.

Again I agree. I myself have some decades of lessons learned (e.g. I 
wrote the first version of "The Software Commandments" in 1996 [1]) and 
I'm still doing my best to learn.

> My own take on this topic in a somewhat more abstract discussion is
> here, in which I point out the gaps in understanding how/why current
> approaches work and how we might close those gaps going forward for
> any approach:
> 
>  	* Johansson, R. & Koopman, P., "Continuous Learning Approach to
> Safety Engineering [2]," Critical Automotive Applications: Robustness
> & Safety / CARS at EDCC2022.

Thanks for sharing this paper. I agree with most of the recommendations 
therein, and believe that we are actually implementing them with our 
customers. In fact I would suggest that applying TSF should explicitly 
guide practitioners along the path you describe (including SPIs, ongoing 
data collection, DevOps-style continuous learning approach to lifecycle 
safety etc.)

br
Paul

[2] https://www.codethink.co.uk/commandments.html