[SystemSafety] Component Reliability and System Safety

Mon Sep 17 11:06:32 CEST 2018

Peter,
thank you for your comments. Please see my continued attempts to flog 
the dog below...

On 2018-09-17 08:29, Peter Bernard Ladkin wrote:
> On 2018-09-14 15:52 , Paul Sherwood wrote:
>> On 2018-09-14 08:03, Peter Bernard Ladkin wrote:
>> <snip>
>>>>> [Paul Sherwood, I think] Why is MISRA C still considered relevant 
>>>>> to system safety in 2018?
>>> 
>>> (Banal question? Banal answer!)
>> 
>> I'm sorry you consider my question banal.
> 
> Do recall I also described my answer as banal, so I can hardly have
> meant it perjoratively.

I'm not convinced that is true, however I'll give you the benefit of the 
doubt.

> Here are two more answers, framed in such a way as to allow them not
> to seem banal:
> 
> a. SW meeting dependable conditions of some sort will necessarily have
> non-local properties, as well
> as non-syntactic properties. That alone entails coding standards if
> you are going to do it right.

As with many recommendations in the real world, that may be true in some 
circumstances.

But software is a very big field. It seems to me that most of the 
software we are relying on these days was developed without following 
coding standards in general, let alone MISRA C.

> b. Anyone who wants to maintain SW beyond the involvement of the
> programmer who wrote it needs to
> have the program written according to some coding standard.

Again, possibly. But other approaches are also available.

We could insist that the software be developed in Haskell, or Rust, or 
some other technology that provides a higher level of control over the 
code creation.

We could insist on only using expert programmers. Or insist on static 
and dynamic analysis in CI/CD pipelines. Or as others have said, raise 
the bar on code inspection/review.

We could even insist on pair progamming, complexity metrics, test code 
coverage metrics and documentation metrics etc but I have little 
confidence in those.

<snip>
> Those were the days.

Hmmm. My whole question was specifically targeting 2018. While I do find 
the historical anecdotes interesting, they're not really relevant.

> Coding standards ipso facto aren't a panacea. They also have to be
> pertinent to the task.

Coding standards can actually be counter-productive, for example if

- they are wrong and/or incomplete, while creating the impression of 
correctness and sufficiency
- they are used when they shouldn't be

This latter point is exactly the reason for my original question.

> To compound a perceived indiscretion further, I suggest it is also
> banal to ask why component
> reliability is important for dependable systems.

dependable != safe

and

property(system) is not necessarily a function of 
[other-property(component)]

>  I would go further - it is important for any system
> which is not deliberately built to subvert the purposes of the client.

Sorry, I don't understand this comment at all.

<snip>
> A question. What important safety properties of a bicycle are *not*
> reducible to component reliability?

For simple systems, where the safety mechanisms are expressly 
mechanical, reliability obviously matters.

And reliability is extremely important property in other systems too, of 
course, for its own sake.

But for **safety** of complex systems, I'm guessing that current best 
practice must involve designing-in safety from multiple directions, with 
failsafes, redundancy and/or similar?

Presumably the architectural level safety considerations must include 
the **expectation of failure in components**, and lead to designs which 
mitigate against expected (bound to happen) failures, to satisfy safety 
goals?

If our safety depends on the reliable behaviour of even a small program 
on (say) a modern multi-core microprocessor interacting with other 
pieces of software in other devices, I think "we are lost" again.

I'm worrying about autonomous vehicles and other systems of similar 
complexity. As I understand it most of the software in these systems 
won't even be written in C, let alone following MISRA C rules.

> I mentioned your comment to an eminent friend (who has had
>> to deal with the human fallout from multiple accidents) and he said 
>> "There are no banal questions
>> about safety. Anyone asking questions and interested in safety is to 
>> be applauded."
> 
> Really? Questions have an audience. It makes a lot of sense to discuss
> the reasons for coding
> standards with first- or second-semester computer science students who
> have never written a serious
> program used by others, just as it makes a lot of sense to discuss the
> following questions with
> children:
> 
> Why does a bicycle have brakes?
> Why do you look to see if traffic is coming before you cross the road?
> Why is there a rule to drive on a fixed half of a road?
> Why are there speed limits on roads with mixed traffic and crossing 
> traffic?
> Why are there speed limits on roads with limited forward visibility?
> 
> However, when prefaced with "dear fellow safety professionals", one
> might consider them banal.

I'm not a "safety professional".

However I am relatively experienced in large scale software, and (as you 
can see) I'm struggling to understand how 'safety professionals' can 
advocate the application of principles from mechanical reliability 
engineering, plus "things we learned on microcontroller-scale projects 
several decades ago" to complex software-intensive systems in 2018.

> Similarly, those who have never flown an airplane may wonder why
> checklists are used for
> configuration for key phases of flight such as landing. Once you have
> flown an airplane and learned
> a little of what happens to others who fly, it becomes banal.

Fair point, but a little off topic imo.

>>> Because many people use C for
>>> programming small embedded systems and
>>> adhering to MISRA C coding guidelines enables the use of static
>>> analysis tools which go some way
>>> (but not all the way) to showing that the code does what you have 
>>> said
>>> you want it to do.
>> 
>> Those people could **just** use static analysis tools, and get the 
>> same benefit.
> 
> Not so in general. Static analysis tools geared towards a specific
> coding standard are usually far
> more effective than those which are not. Consider SPARK and SCADE.

I agree that static analysers are not enough in general, but vs your 
specific answer, I believe my statement holds.

> Also consider the project
> mentioned above which exercised the Sun HW. The aerospace manufacturer
> paid (lots) for bespoke
> analysis. There was a reason for that.

I'm sorry but I remain unconvinced that the lessons from the 90s are 
still relevant.

I've seen the reboot screens of infotainment systems on several 
commercial aeroplanes - generally a version of u-boot and a Red Hat 
Linux from some decades prior to the time of the crash/reboot. I'm 
hoping that these systems are not connected to the same network as the 
instrumentation and controllers, but also I'm wondering how safety is 
assured when

a) passengers have been told for years not to use personal devices, 
'because safety' (probably nonsense, I know)
b) some planes now expressly provide internet facilities for passenger 
devices

In automotive I know that some user-facing (and even internet-facing) 
systems *do* sit on the CAN bus, alongside multiple 
subsystems/components which are (presumed safe because they were) 
developed in accordance with MISRA C guidelines.

br
Paul