[SystemSafety] Collected stopgap measures

Sat Nov 3 13:00:43 CET 2018

On 2018-11-03 11:13 , Paul Sherwood wrote:
> 
> On 2018-11-02 11:24, Peter Bernard Ladkin wrote:
>> ....
>> If you are producing commercial safety-related software, non-medical,
>> non-defence,
>> non-civil-aerospace, in GB then HSE requires in effect that you are
>> able to show compliance with IEC
>> 61508 Part 3, known as IEC 61508-2:2010. A couple of us here are on
>> the IEC Maintenance Team for
>> this standard, and one of us is a Director of HSE. So this thread is
>> talking to some direct experience.
> 
> The key escape clause in your words is "in effect".  It's not clear to me that the applicable laws
> require compliance with IEC 61508 at all.

There is no "escape clause". I am reporting what I have been told.

Laws are one thing. Reducing risks ALARP is the legal principle, from 1949. Getting prosecuted by
HSE is another thing. HSE said on the predecessor to this list that their criterion for prosecution
is whether or not the organisation complied with IEC 61508 in developing whatever system.

Obviously, you can be prosecuted and declared not guilty. Equally, your system can cause damage and
you not necessarily prosecuted.

> the UK position boils down to
> demonstrating that risks have been reduced SFAIRP. 

That is established UK law from 1949.

> From my limited exposure to the IEC documents, they seem to be of little use to me as a duty-holder
> attempting to wrestle with connected software-intensive systems at scale. 

Right. There is little attempt to prescribe how you go about building a safety-related system (that
is a common complaint, in fact, from new entrants). But the standard does tell you what you have to
document and what you have to assure. Even if it does so in a way which you might consider obscure
and convoluted. And even if there are gaps (which there are).

> In fact I'd go so far as
> to say that the standards seem to be dangerously misleading.

You suggest that IEC 61508 is leading people to do things which can result in harm? How so?

>> A risk analysis must be performed (hazard identification, hazard
>> analysis - basically the
>> assignation of a severity to each hazard, and some estimate of
>> likelihood, then risk assessment, the
> 
> OK, so we're already off down a track that doesn't work out very well in practice - humans are awful
> at estimating likelihoods, for example.
"Humans" have been performing this kind of risk analysis in civil aerospace for 80 years or more, as
part of the Approved Means of Compliance (as it is called in EASA-Speak) and it seems to work out
very well indeed.

They also do it in rail electronic kit. Works well there, too, although I don't think as well as in
civil aerospace.

If you don't like those ways of estimating risk, how would you go about estimating risk?

> Yes. As I understand it all of the above fails to consider, for example:
> 
> - specification risks

What is a "specification risk" How can a specification possibly have any risk associated with it (I
am using the term "risk" here as in IEC 61508-4 subclause 3.1.6 or electropedia.org item 351-57-03)

> - component interaction risks

Dealt with. If component interaction engenders a hazard, then hazard identification will spot it if
you do the hazard identification right.

> - hand-off/silo risks

I don't know what these are

> - cascading failures

Not mentioned by name, indeed. But how do these pose a risk that, according to you, is not dealt
with? Cascading failures result from hazards (the situation which allowed/engendered the first
failure). The severity of that hazard is the entire damage caused at the end of the cascade. So hat
is missed?

If you are looking for advice on how to anticipate cascading failures, that is out of scope of the
normative part of the standard, although hints are often given in NOTES or the informative parts
when people think it is a big deal.

If you think something should be said specifically about cascading failures, why not write something
up and give it to me? Or to Bertrand or to Dewi.

> - security hazards leading to safety hazards

IEC 61508-1:2010 subclause 7.4.4.2 (I happen to know this one by heart, since I have spent the last
year talking about it).

>> The idea is that the risk-analysis-and-consequence definition of
>> safety function all happens at the
>> system level.
> 
> Fantastic, except that systems folk may or may not have any actual understanding of what's really
> going on in the subsystems. Very few people can adequately reason about safety, security, software,
> electronics and networks all at the same time.

"What's really going on in the subsystems" is not relevant to defining a safety function (notice
that word "definition" in my sentence to which you are responding). It may be relevant to
implementing a safety function.

>> The SW developers are given a set of requirements by the
>> systems people, and then they
>> develop to those requirements. It is theoretically for the systems
>> people to get the requirements
>> right, not the SW people. In practice, there will be negotiation of
>> course.
> 
> In the case where, say, a modern multicore microprocessor is to be used in "EUC", I'm guesstimating
> between 100kloc and 1Mloc of 'firmware' and/or 'microcode' delivered by the silicon vendor's team
> (including bought-in), before we even get to what folks commonly describe as 'Board Support
> Package'. It's extremely unlikely that the systems people have any influence at all over that
> pre-cooked code.

I agree that multicore devices engender assurance problems. That is actually a big worry for IEC
61508 MT-1/2 and there is a task group addressing it. But what does that have to do with specifying
requirements?

> 
>> The SIL of a safety
>> function constrains techniques which it is "highly recommended" be
>> used. So if you are developing SW
>> with a SIL 3 or SIL 4 systematic capability, then it is "highly
>> recommended" that formal methods be
>> applied in the specified places.
> 
> I understand the idea. It's not working out very well in practice as far as I can tell, because
> people tend to cling to their own comfort blankets and extrapolate into the unknown based on their
> own limited experiences.
> 
> And as Olwen has stated in various colourful ways recently, lots of folk aren't even attempting to
> do the thinking.

I doubt Olwen was primarily talking about coding teams implementing code which is part of safety
functions.

>> Developing SW according to IEC 61508-3:2010 will involve you in almost
>> 60 documentation
>> requirements. You will have to produce those 60 documents. About a
> 
> Only if we believe that the IEC spells and incantations are fit for their purpose. Perhaps following
> them blindly will be enough to help defend in court in the event of accident. Or perhaps not.

Oh, I hope we agree that having a requirements specification and a safety requirements specification
and a careful record of the derivation of your tests and their results are somewhat more than
"spells and incantations". Almost everyone in SW dependability considers them essential.

I suspect that having to produce 60 documents during your SW development is likely to ensure that
you pay more careful attention to your software development than you might have done if, say, only 3
had been required.

> Actually I would **really** like to understand what a 'SW safety requirements specification' is,
> from the perspective of the current expert community.

I can tell you what I think one is, but the IEC doesn't yet agree with me. If you can read German,
https://rvs-bi.de/publications/Theses/Dissertation_Bernd_Sieker.pdf shows how to derive one
rigorously for one particular case, along with a proof that it is complete, and a derived
implementation in SPARK.

>> I do hope that with the twenty-odd Assurance Points that we are
>> developing in IEC 61508-3-2, much of
>> this will become more orderly. We'll see.
> 
> If the documents were made public, it could be critiqued by people other than the folks who wrote
> them and/or are part of the standards Ponzi scheme.
I wrote it, originally with the help of Peter Bishop, Robin Bloomfield, John Knight, Bev Littlewood,
John Rushby, Lorenzo Strigini and Martyn Thomas. At that time, the group included one-quarter of all
the winners of the IEEE Harlan Mills Award, possibly the highest software engineering award. The
work was enhanced by a one-person-year state-of-the-art review of what is out there by Bernd Sieker,
followed by a dozen meetings over two years of the German Safe Software mirror committee.

A version of the original has been publicly available on-line at
https://rvs-bi.de/publications/Papers/LadkinAdaConnection2011.pdf since mid-2011. Slides for the
accompanying talk at https://rvs-bi.de/publications/Talks/AdaConn2011TalkLadkin.pdf

A more recent version, slides from October 2017 after the IEC submission was finalised, can be found
at https://www.his-2018.co.uk/session/formal-methods-for-safety-critical-software-assurance-1

Plenty of opportunity there for the public to comment at will. If you call that a Ponzi scheme, I'd
be fascinated to see what you think is an appropriate standards development.

PBL

Prof. Peter Bernard Ladkin, Bielefeld, Germany
MoreInCommon
Je suis Charlie
Tel+msg +49 (0)521 880 7319  www.rvs-bi.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20181103/0ee9f963/attachment.sig>