[SystemSafety] Stupid Software Errors [was: Overflow......]

David Haworth david.haworth at elektrobit.com
Tue May 5 08:39:44 CEST 2015


It seems to me that this thing is being hyped up (in the
media and here) as yet another example of stupid programmers,
making stupid mistakes that could have been/should have been/
definitely would have been detected by some magical analysis tool.
And really, no-one has any detailled information as far as I can tell.

What if, in the final analysis, we find the something like
the following code called from an interrupt service routine that
handles the 100 Hz timer interrupt:


int32 jiffies = 0;


void RecordTime(void)
{
    if ( jiffies >= ABSOLUTEMAXUPTIME )
    {
        /* TraceLink: DesignSpec.MaxUptime
         *
         * The specified maximum uptime of this system is 72 hours.
         * ABSOLUTEMAXUPTIME is the maximum time we can possibly
         * support (rounded down to a whole number of days)
         * and turns out to be 248 days.
         * If we ever reach this limit there must be something seriously
         * wrong. Bad things will happen.
        */
        ShutdownSystem(AbsoluteMaximumUptimeExceeded);
    }
    else
    {
        jiffies++;
    }
}

Questions for the list inhabitants:

1. would the behaviour of the system containing this code
   match the behaviour reported in the press?
2. would the static analysis tools find this "error"?
3. would the requirements/design tracing (which I presume takes place)
   lead us to the exact requirement for maximum operating time
   before a restart?
4. would we find the place in the operation manual, service manual
   or some other document where it already states when these systems
   should be shut down or restarted?
5. Have these testers read all of the documentation about this
   aeroplane? Remember, by the McDonnell-Douglas Law of Aircraft Design,
   the aircraft shall not fly until the weight of the documentation
   exceeds the weight of the aircraft.  :-)

My personal opinion: this is a non-issue.

Dave


On 2015-05-04 15:05:56 +0100, Martyn Thomas wrote:
> Was this 8 months of simulation, to find an overflow error that static
> analysis could find in seconds?
> 
> It may even be true that the developers assumed correctly that noone
> would fly for 8 months without powering off the generators - in which
> case their fault may have just been not documenting that assumption as a
> requirement.
> 
> Martyn
> 
> On 04/05/2015 13:31, Matthew Squair wrote:
> > On the other hand I don't think we should loose sight of the fact that
> > the Boeing 'bug' was found by running a long duration simulation, not
> > by an airliner falling out of the sky. So perhaps thanks is due to the
> > Boeing safety or software engineer(s) who insisted on a long run
> > endurance test and who might have actually learned something from history?
> >  
> >
> 
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE

-- 
David Haworth B.Sc.(Hons.), OS Kernel Developer    david.haworth at elektrobit.com
Tel: +49 9131 7701-6154     Fax: -6333                  Keys: keyserver.pgp.com
Elektrobit Automotive GmbH           Am Wolfsmantel 46, 91058 Erlangen, Germany
Geschäftsführer: Alexander Kocher, Gregor Zink       Amtsgericht Fürth HRB 4886


----------------------------------------------------------------
Please note: This e-mail may contain confidential information
intended solely for the addressee. If you have received this
e-mail in error, please do not disclose it to anyone, notify
the sender promptly, and delete the message from your system.
Thank you.



More information about the systemsafety mailing list