[SystemSafety] Stupid Software Errors [was: Overflow......]

Jensen, Martin Faurschou martin-faurschou.jensen at siemens.com
Tue May 5 09:43:26 CEST 2015


I remember an old professor teaching me (or at least trying to), embedded software design many years ago. He was claiming that, any value that we expected to count upwards on regular intervals, should be initialized, not to zero, but to a value slightly below its maximum. He indeed believed that this would save us from much grief later on, as our bad design would then show up rather early... 


With best regards,
Martin Faurschou Jensen

Siemens A/S
PD PA PI FL R&D STC
Nordborgvej 81
6430 Nordborg, Denmark 
Tel.: +45 7488-2685
mailto:martin-faurschou.jensen at siemens.com

Siemens A/S. Headquarters: Borupvang 9, 2750 Ballerup, Denmark. Tel: +45 4477 4477 CVR-no. 16 99 30 85
-----Original Message-----
From: systemsafety-bounces at lists.techfak.uni-bielefeld.de [mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de] On Behalf Of David Haworth
Sent: 5. maj 2015 09:29
To: Peter Bernard Ladkin
Cc: systemsafety at lists.techfak.uni-bielefeld.de
Subject: Re: [SystemSafety] Stupid Software Errors [was: Overflow......]

Peter,

On 2015-05-05 09:03:31 +0200, Peter Bernard Ladkin wrote:
> After fifty years of programs not handling date and time correctly, 
> even according to their own requirements, including the single most 
> expensive anomaly in history coming in at well over $300bn, it surely 
> requires heroic nonchalance to suggest
> 
> > .... this is a non-issue.

I've also seen my fair share of time/date problems, including the impending Year 2038 bug (which no-one seems to be worrying much about).

I also notice that 248 days is pretty well exactly the range of a 32-bit signed integer incremented as 10ms intervals. So this is quite emphatically *not* a time/date issue, but an uptime issue.

Many control systems have specified maximum uptimes. They often rely on checks and tests performed at startup or shutdown to detect latent hardware failures. Destructive RAM tests are a prime example of things that cannot be done properly during operation.

I also notice that the press reports state that the systems "shut themselves down", not "crash", which implies (to me at least) that there is at least some error detection and handling going on.

So, in the absence of any detailed technical information about the requirements, design, implementation, service documentation, operational manuals etc. about these systems, I stand by my opinion until facts emerge that prove me wrong.

There are worse things to worry about.

Dave


-- 
David Haworth B.Sc.(Hons.), OS Kernel Developer    david.haworth at elektrobit.com
Tel: +49 9131 7701-6154     Fax: -6333                  Keys: keyserver.pgp.com
Elektrobit Automotive GmbH           Am Wolfsmantel 46, 91058 Erlangen, Germany
Geschäftsführer: Alexander Kocher, Gregor Zink       Amtsgericht Fürth HRB 4886


----------------------------------------------------------------
Please note: This e-mail may contain confidential information intended solely for the addressee. If you have received this e-mail in error, please do not disclose it to anyone, notify the sender promptly, and delete the message from your system.
Thank you.

_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE


More information about the systemsafety mailing list