[SystemSafety] Collected stopgap measures

Paul Sherwood paul.sherwood at codethink.co.uk
Sat Nov 3 11:13:59 CET 2018


Hi Peter,

Thanks for explaining this in detail.

I'm not a lawyer, etc, but I have some comments...

On 2018-11-02 11:24, Peter Bernard Ladkin wrote:
> Some points concerning safety and SW.
> 
> If you are producing commercial safety-related software, non-medical,
> non-defence,
> non-civil-aerospace, in GB then HSE requires in effect that you are
> able to show compliance with IEC
> 61508 Part 3, known as IEC 61508-2:2010. A couple of us here are on
> the IEC Maintenance Team for
> this standard, and one of us is a Director of HSE. So this thread is
> talking to some direct experience.

The key escape clause in your words is "in effect".  It's not clear to 
me that the applicable laws require compliance with IEC 61508 at all.

 From the discussions I've had and the reading I've done, it seems that 
the UK position boils down to demonstrating that risks have been reduced 
SFAIRP. That broadly requires that duty-holders establish what the risks 
are, and do what is necessary to mitigate against them.

 From my limited exposure to the IEC documents, they seem to be of little 
use to me as a duty-holder attempting to wrestle with connected 
software-intensive systems at scale. In fact I'd go so far as to say 
that the standards seem to be dangerously misleading.

And as I've said elsewhere, I think the business model around the 
documents is a disgrace.

> In Germany, there is no general one-stop regulator across (many but
> not all) industries as there is
> in the UK. But if something goes wrong you will get assessed and maybe
> prosecuted by the state's
> attorneys and showing them you have complied with applicable standards
> is a must. IEC 61508 and/or
> its sector-specific derivative is one of those applicable standards.

I'm attempting to understand what's properly applicable, in such a way 
that I can communicate (and ideally enforce) it for teams in 
multi-vendor deliveries involving today's hardware. The standards don't 
fit the bill afaict, not least because they were created decades ago, as 
a minority sport, and have not really been subject to adequate scrutiny 
or revision as technology has advanced.

> The way that IEC 61508 requires you address safety is as follows. You
> have Equipment Under Control
> (EUC) and this equipment can theoretically behave in such a way that
> it causes damage (hurts or
> kills people, damages non-related things, mucks up the environment).
> 
> A risk analysis must be performed (hazard identification, hazard
> analysis - basically the
> assignation of a severity to each hazard, and some estimate of
> likelihood, then risk assessment, the

OK, so we're already off down a track that doesn't work out very well in 
practice - humans are awful at estimating likelihoods, for example.

> combination of likelihood with severity). "Society" sets the
> acceptable risk, per hazard. If the EUC
> risk from the risk analysis exceeds this acceptable risk, then a
> *safety function* (technical term,
> henceforth SF) must be introduced, whose operation avoids or mitigates
> the risk ("mitigate" means
> either reduce the likelihood of occurrence, or reduce the severity, or
> both). It is assumed the SF
> may fail. It can't fail all the time, for then the risk is the plain
> EUC risk and that is not
> acceptable by hypothesis. So the SF gets a reliability condition
> imposed, one of four. These
> reliability conditions are called "safety integrity levels", or SILs.
> For random HW failures, the
> SILs are quantitative (probability of failure per demand/per hour).
> SILs for SW (which is not taken
> to be susceptible to "random" failure, but only to "systematic"
> failure, that is, reproducible
> failure due to design) are not quantitative, and are conceptually a
> little more complicated,
> referring to something called "systematic capability".

Yes. As I understand it all of the above fails to consider, for example:

- specification risks
- component interaction risks
- hand-off/silo risks
- cascading failures
- security hazards leading to safety hazards

etc.

> The idea is that the risk-analysis-and-consequence definition of
> safety function all happens at the
> system level.

Fantastic, except that systems folk may or may not have any actual 
understanding of what's really going on in the subsystems. Very few 
people can adequately reason about safety, security, software, 
electronics and networks all at the same time.

> The SW developers are given a set of requirements by the
> systems people, and then they
> develop to those requirements. It is theoretically for the systems
> people to get the requirements
> right, not the SW people. In practice, there will be negotiation of
> course.

In the case where, say, a modern multicore microprocessor is to be used 
in "EUC", I'm guesstimating between 100kloc and 1Mloc of 'firmware' 
and/or 'microcode' delivered by the silicon vendor's team (including 
bought-in), before we even get to what folks commonly describe as 'Board 
Support Package'. It's extremely unlikely that the systems people have 
any influence at all over that pre-cooked code.

> The SIL of a safety
> function constrains techniques which it is "highly recommended" be
> used. So if you are developing SW
> with a SIL 3 or SIL 4 systematic capability, then it is "highly
> recommended" that formal methods be
> applied in the specified places.

I understand the idea. It's not working out very well in practice as far 
as I can tell, because people tend to cling to their own comfort 
blankets and extrapolate into the unknown based on their own limited 
experiences.

And as Olwen has stated in various colourful ways recently, lots of folk 
aren't even attempting to do the thinking.

> Developing SW according to IEC 61508-3:2010 will involve you in almost
> 60 documentation
> requirements. You will have to produce those 60 documents. About a

Only if we believe that the IEC spells and incantations are fit for 
their purpose. Perhaps following them blindly will be enough to help 
defend in court in the event of accident. Or perhaps not.

In the security world, Geer's Law is often cited: "Any security 
technology whose effectiveness can't be empirically determined is 
indistinguishable from blind luck."

I fear that in the safety world, there's not enough transparency to 
establish what has been luck and what is effective.

> third of them concern your
> testing (protocols, execution, results). I think people can well
> imagine that, unless you start your
> development process knowing you are going to have to produce those
> almost-60 documents, you will
> very likely be unable to show compliance to an assessor (or a
> prosecutor if your client has had some
> bad luck). Not only that, but there are a lot of tables in Annexes A
> and B saying in quite specific
> terms what methods are "recommended" or "highly recommended" and
> where. So assessors are likely to
> be checking on that, also. Things such as "formal proof" or "formal
> verification", "static analysis"
> and forward/backward traceability between SW safety requirements
> specification and SW safety
> validation plan. (There is a question what a SW safety requirements
> specification is, but I won't
> get into that.) You can go cross-eyed looking through it all (maybe
> you need to be cross-eyed
> looking through it?).

Actually I would **really** like to understand what a 'SW safety 
requirements specification' is, from the perspective of the current 
expert community.

> I do hope that with the twenty-odd Assurance Points that we are
> developing in IEC 61508-3-2, much of
> this will become more orderly. We'll see.

If the documents were made public, it could be critiqued by people other 
than the folks who wrote them and/or are part of the standards Ponzi 
scheme.

> So that is the way it is done. You can't teach it in universities from
> the source, because the IEC
> wants each user to buy a copy and if you want the full set it will
> cost you thousands of €/$/£.

That's the way the minority sport is claimed to be done. I still have 
some doubts that it's how things are *actually* done most of the time, 
let alone how they *should be* done.

br
Paul


More information about the systemsafety mailing list