[SystemSafety] The mindset for safety-critical systems design

Tue Sep 18 17:11:34 CEST 2018

The two cardinal principles of critical systems design are:

1.    Whatever is not there cannot go wrong (so do not include any 
functions that you do not need).

2.    Whatever is there is less likely to go wrong the simpler it is.

Every Tom, Dick and Harry will quote you the second principle but it is 
much rarer to find people recognising the first principle explicitly.

The outstanding example of the first principle is the Soyuz spacecraft, 
where the Russians discovered by simple mathematics that the mass of a 
spacecraft is dominated by the part of the craft that has to be returned 
to Earth safely. Hence, much of the living accommodation in the Soyuz is 
in the orbital module, which is jettisonned with the service module 
before re-entry so that only the descent module has to return to Earth 
intact, this module containing all and only the systems required to 
achieve safe return of the cosmonauts. As a result, the Soyuz vehicle 
for the circumlunar mission (what Apollo 8 first did) provided as much 
living accommodation as the Apollo craft while the entire Soyuz craft, 
including service module, descent module and orbital module, weighed 
about the same as the Apollo command module on its own.

Much the same thinking is seen in the design of the Vostok craft in 
which, to minimise launch weight, the cosmonaut descended by parachute 
after ejecting from the capsule, which crash-landed, thereby removing 
the need for the retro-rocket braking used in the Soyuz.  Also, as I've 
seen in a Soyuz craft in an exhibition in Washington DC, there are 
bungee pockets around the walls of the capsule to keep hand-held devices 
needed on voyage from floating about in zero-g. Low-tech yet arguably 
the simplest possible piece of design for that purpose.

Now the poser: *Why do people readily recall the second principle (even 
if only to pay lip-service) yet often struggle, even when prompted, to 
recall the first?*

It is failure to recognise the first principle that ends up producing 
neo-natal ventilators that run Windows 7 Embedded (see a previous 
posting). Ventilation is a time-triggered cyclic process that does not 
need an operating system to support it. If you need to provide data 
logging to non-volatile media, as the said ventilator did, then you 
still do not need Windows. Nor do you need Windows for offline software 
update functions or display on a screen. And this is not to mention the 
greater cost of chips to run Windows as compared, say, to dual-core 
lockstep microcontrollers that would have supported a much safer 
software design. In the firm in which I saw this, only two other 
engineers gave me any impression that they understood the importance of 
the first principle. Ironically one of them decided that he could take 
it into account by configuring into W7 Embedded only those functions 
that were needed if Windows were there at all (one out of ten for 
effort). The other one who thought using Windows was wrong was actually 
a hardware engineer.

Anyone got any ideas why software engineers in particular get the second 
principle but miss the first? It beats me. If you miss the first 
principle, you'll never retrieve the situation even if you stick 
rigorously to the second.

Olwen

PS: Fear not. My postings will tail off when I've stopped dumping my 
more egregious examples of insanity for your perusal.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20180918/4e03d62a/attachment.html>