[SystemSafety] Component Reliability and System Safety
Paul Sherwood
paul.sherwood at codethink.co.uk
Mon Sep 17 11:06:32 CEST 2018
Peter,
thank you for your comments. Please see my continued attempts to flog
the dog below...
On 2018-09-17 08:29, Peter Bernard Ladkin wrote:
> On 2018-09-14 15:52 , Paul Sherwood wrote:
>> On 2018-09-14 08:03, Peter Bernard Ladkin wrote:
>> <snip>
>>>>> [Paul Sherwood, I think] Why is MISRA C still considered relevant
>>>>> to system safety in 2018?
>>>
>>> (Banal question? Banal answer!)
>>
>> I'm sorry you consider my question banal.
>
> Do recall I also described my answer as banal, so I can hardly have
> meant it perjoratively.
I'm not convinced that is true, however I'll give you the benefit of the
doubt.
> Here are two more answers, framed in such a way as to allow them not
> to seem banal:
>
> a. SW meeting dependable conditions of some sort will necessarily have
> non-local properties, as well
> as non-syntactic properties. That alone entails coding standards if
> you are going to do it right.
As with many recommendations in the real world, that may be true in some
circumstances.
But software is a very big field. It seems to me that most of the
software we are relying on these days was developed without following
coding standards in general, let alone MISRA C.
> b. Anyone who wants to maintain SW beyond the involvement of the
> programmer who wrote it needs to
> have the program written according to some coding standard.
Again, possibly. But other approaches are also available.
We could insist that the software be developed in Haskell, or Rust, or
some other technology that provides a higher level of control over the
code creation.
We could insist on only using expert programmers. Or insist on static
and dynamic analysis in CI/CD pipelines. Or as others have said, raise
the bar on code inspection/review.
We could even insist on pair progamming, complexity metrics, test code
coverage metrics and documentation metrics etc but I have little
confidence in those.
<snip>
> Those were the days.
Hmmm. My whole question was specifically targeting 2018. While I do find
the historical anecdotes interesting, they're not really relevant.
> Coding standards ipso facto aren't a panacea. They also have to be
> pertinent to the task.
Coding standards can actually be counter-productive, for example if
- they are wrong and/or incomplete, while creating the impression of
correctness and sufficiency
- they are used when they shouldn't be
This latter point is exactly the reason for my original question.
> To compound a perceived indiscretion further, I suggest it is also
> banal to ask why component
> reliability is important for dependable systems.
dependable != safe
and
property(system) is not necessarily a function of
[other-property(component)]
> I would go further - it is important for any system
> which is not deliberately built to subvert the purposes of the client.
Sorry, I don't understand this comment at all.
<snip>
> A question. What important safety properties of a bicycle are *not*
> reducible to component reliability?
For simple systems, where the safety mechanisms are expressly
mechanical, reliability obviously matters.
And reliability is extremely important property in other systems too, of
course, for its own sake.
But for **safety** of complex systems, I'm guessing that current best
practice must involve designing-in safety from multiple directions, with
failsafes, redundancy and/or similar?
Presumably the architectural level safety considerations must include
the **expectation of failure in components**, and lead to designs which
mitigate against expected (bound to happen) failures, to satisfy safety
goals?
If our safety depends on the reliable behaviour of even a small program
on (say) a modern multi-core microprocessor interacting with other
pieces of software in other devices, I think "we are lost" again.
I'm worrying about autonomous vehicles and other systems of similar
complexity. As I understand it most of the software in these systems
won't even be written in C, let alone following MISRA C rules.
> I mentioned your comment to an eminent friend (who has had
>> to deal with the human fallout from multiple accidents) and he said
>> "There are no banal questions
>> about safety. Anyone asking questions and interested in safety is to
>> be applauded."
>
> Really? Questions have an audience. It makes a lot of sense to discuss
> the reasons for coding
> standards with first- or second-semester computer science students who
> have never written a serious
> program used by others, just as it makes a lot of sense to discuss the
> following questions with
> children:
>
> Why does a bicycle have brakes?
> Why do you look to see if traffic is coming before you cross the road?
> Why is there a rule to drive on a fixed half of a road?
> Why are there speed limits on roads with mixed traffic and crossing
> traffic?
> Why are there speed limits on roads with limited forward visibility?
>
> However, when prefaced with "dear fellow safety professionals", one
> might consider them banal.
I'm not a "safety professional".
However I am relatively experienced in large scale software, and (as you
can see) I'm struggling to understand how 'safety professionals' can
advocate the application of principles from mechanical reliability
engineering, plus "things we learned on microcontroller-scale projects
several decades ago" to complex software-intensive systems in 2018.
> Similarly, those who have never flown an airplane may wonder why
> checklists are used for
> configuration for key phases of flight such as landing. Once you have
> flown an airplane and learned
> a little of what happens to others who fly, it becomes banal.
Fair point, but a little off topic imo.
>>> Because many people use C for
>>> programming small embedded systems and
>>> adhering to MISRA C coding guidelines enables the use of static
>>> analysis tools which go some way
>>> (but not all the way) to showing that the code does what you have
>>> said
>>> you want it to do.
>>
>> Those people could **just** use static analysis tools, and get the
>> same benefit.
>
> Not so in general. Static analysis tools geared towards a specific
> coding standard are usually far
> more effective than those which are not. Consider SPARK and SCADE.
I agree that static analysers are not enough in general, but vs your
specific answer, I believe my statement holds.
> Also consider the project
> mentioned above which exercised the Sun HW. The aerospace manufacturer
> paid (lots) for bespoke
> analysis. There was a reason for that.
I'm sorry but I remain unconvinced that the lessons from the 90s are
still relevant.
I've seen the reboot screens of infotainment systems on several
commercial aeroplanes - generally a version of u-boot and a Red Hat
Linux from some decades prior to the time of the crash/reboot. I'm
hoping that these systems are not connected to the same network as the
instrumentation and controllers, but also I'm wondering how safety is
assured when
a) passengers have been told for years not to use personal devices,
'because safety' (probably nonsense, I know)
b) some planes now expressly provide internet facilities for passenger
devices
In automotive I know that some user-facing (and even internet-facing)
systems *do* sit on the CAN bus, alongside multiple
subsystems/components which are (presumed safe because they were)
developed in accordance with MISRA C guidelines.
br
Paul
More information about the systemsafety
mailing list