[SystemSafety] Bursting the formal methods bubble

Fri Oct 27 22:40:09 CEST 2017

Derek,

Thanks for the reference to the NASA study. It was interesting. In
particular, the statement near the bottom of page 18 struck me:

“It is interesting to note that almost half (48%) of the questions
revealed ambiguities and errors [me: in the requirements document]. Since
these problems were found by different teams and across different
universities, the N-version environment contributed to the debugging of
the specifications. It is not possible to quantify from this task the
potential benefits of N-version programming in specification debugging.”

The same―or better―degree of “specification debugging” might have been
achieved at significantly lower cost by use of Formal Inspection
(sometimes called Fagan Inspection). More on this, below.

You asked: “Is the problem size an issue?  The Brilliant et al study
required a mean
effort of 47 hours to complete, the NASA study hired everybody for 10
weeks.”

I suspect it is not, although I don’t have any solid data to back it up.

You also asked: “Do you have any data on this?” in reference to peer
reviews.

Yes, there’s a ton of data, particularly on use of Formal Inspections. One
of the best sources is the following:

David Wheeler, Bill Brykczynski, Reginald Meeson, "Software Inspection -
An Industry Best Practice", IEEE Computer Society Press, 1996

In particular, two papers in the Wheeler et. al. book stand out. One is
the Foreword by Michael Fagan and the other is Glen Russell's “Experience
with Inspection in Ultralarge-Scale Developments”

Other good data sources include:

Capers Jones, “Software Quality in 1999: What Works and What Doesn’t.”,
Software Quality Research Inc., 1999

Don O’Neill, “Setting Up a Software Inspection Program”, Crosstalk,
February, 1997

Stefan Wagner, “A literature survey of the quality economics of
defect-detection techniques”, Proceedings of the 2006 ACM/IEEE
International Symposium on Empirical Software Engineering ISESE '06, 2006,
pp 194-203

Their data is consistent with my own experience with Formal Inspections on
projects at several different organizations:

*) Formal inspections are between 60% and 90% effective at finding defects
(my experience on projects was as high as 95%)

*) In terms of efficiency, between 0.8 and 1.2 labor hours are invested
per defect found (my experience is 1.1 hours per defect)

*) Glen Russells’ paper (in Wheeler et. al., above) reported their return
on investment from Formal Inspections to be 40:1. In other words, for each
labor hour invested in Formal Inspection 40 hours of downstream rework
were avoided. Similarly, E. P. Doolan (“Experience With Fagan’s Inspection
Method”, Software Practice and Experience, Vol 22, No 2, John Wiley &
Sons, February, 1992. Also in Wheeler et. al.) reported their return on
investment was 44:1. A not widely distributed report out of Verizon stated
54:1. So it may seem entirely counter-intutitive, but Formal Inspections
don’t “cost” a project anything. They save the project many times more
than they cost.

The level of investment in Formal Inspections varies from project to
project, but in my personal experience it tends to hover at around 10% of
total technical labor (I mean, total effort spend on requirements, design,
construction, developer testing, and inspections combined).

So I would summarize my earlier argument this way:

1) Assume a typical project using mainstream development practices
(specifically, not using Formal Inspections) would cost C$. To develop the
software more than once in an N-version manner would cost N*C$.

2) In typical organizations about 60% of total project effort is spent on
unplanned rework to fix defects injected earlier (e.g., in requirements,
in design). Using Formal Inspections can reduce that unplanned rework
effort to around 10%. The total cost of the project when using Formal
Inspections drops from C$ to 0.5*C$.

3) Per the quote from the NASA paper above, “It is not possible to
quantify from this task the potential benefits of N-version programming in
specification debugging” so we can’t reliably compare defect finding
effectiveness in N-version development with that of Formal Inspections. I
think it’s safe to say that N-version development is unlikely to be better
than Formal Inspections. For sake of argument, let’s assume it’s the same
even though IMHO I think it’s likely to be even nearly as good.

4) Given than in N-version development N > 1, it is clear that N*C$ >>
0.5C$ for the same (or, likely, better) quality out of the project that
used Formal Inspections

Cheers,

― steve

-----Original Message-----
From: Derek M Jones <derek at knosof.co.uk>
Organization: Knowledge Software, Ltd
Date: Thursday, October 26, 2017 at 3:02 PM
To: Steve Tockey <Steve.Tockey at construx.com>,
"systemsafety at lists.techfak.uni-bielefeld.de"
<systemsafety at lists.techfak.uni-bielefeld.de>
Subject: Re: [SystemSafety] Bursting the formal methods bubble

Steve,

> “Using different sets of people increases the probability that problems
> will be found in the specification/design, through people having
> different points if view and asking different questions (I cannot find a
> DOD N-version study that documented this effect).”

It was a NASA study, see table 4-1:
https://ntrs.nasa.gov/search.jsp?R=19870020663

> 1) According to the Brilliant, Knight, Leveson paper there won’t be very
> much additional defect identification. If I recall correctly, the
>subjects
> in their study were all working to the same requirements specification
>and
> around 60% of the defects they injected were in common to most
>developers.

Is the problem size an issue?  The Brilliant et al study required a mean
effort of 47 hours to complete, the NASA study hired everybody for 10
weeks.

> 2) The cost of building non-trivial software even only once can be pretty
> high, building it more than once simply multiplies the cost by the number
> of versions built. It can be really, really expensive to build parallel
> versions.

Yes, this is one reason why formal methods (which essentially creates
a whole new implementation) is only used for critical cases.

> 3) The same (or higher) defect FINDING effectiveness could easily be
> achieved by using an appropriate form of independent peer review, and the
> cost of running independent peer reviews on one version being built will
> be significantly less than the cost of building N versions.

Do you have any data on this?

> So I am agreeing that multiple people lead to better work, however all of
> the data I have point to the most effective use of most of those people
> will be in a peer review capacity not in a build-it-more-than-once
> capacity.

This is one argument.  The whole issue is being discussed in a ultra-low
data environment (there are small specific bits and pieces floating
around).

> 
> 
> ― steve
> 
> 
> 
> -----Original Message-----
> From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de>
> on behalf of Derek M Jones <derek at knosof.co.uk>
> Organization: Knowledge Software, Ltd
> Date: Wednesday, October 25, 2017 at 10:57 AM
> To: "systemsafety at lists.techfak.uni-bielefeld.de"
> <systemsafety at lists.techfak.uni-bielefeld.de>
> Subject: Re: [SystemSafety] Bursting the formal methods bubble
> 
> Steve,
> 
>> ³If you do multiple implementations, e.g., Ada+formal or
>> Ada+Visual basic, you will detect more problems and the result will
>> be more reliable software.²
>>
>> Do you mean ³N-Version development²? If so, this would seem to
>>contradict
>> results published by Susan Brilliant, John Knight, and Nancy Leveson in
> ...
>> The point I got out of that paper was that developers don¹t make
>>randomly
>> different mistakes, they usually make the same systemic mistakes
>>(notably
> 
> People do make the same mistakes, during the specification, design,
> requirements and testing phases.
> 
> Using different sets of people increases the probability that problems
> will be found in the specification/design, through people having
> different points if view and asking different questions (I cannot find a
> DOD N-version study that documented this effect).
> 
> Having multiple implementations provides a means for comparing output,
> looking for differences.
> 
> There has been some work looking at how using different languages and
> computing platforms leads to more 'difference' faults being found.
> 
> Would more faults be uncovered by reimplementing an Ada program in
> Visual basic or formal methods?  They are both very different.
> 
>> -----Original Message-----
>> From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de>
>> on behalf of Derek M Jones <derek at knosof.co.uk>
>> Organization: Knowledge Software, Ltd
>> Date: Wednesday, October 25, 2017 at 6:36 AM
>> To: jean-louis Boulanger <jean.louis.boulanger at gmail.com>
>> Cc: "systemsafety at lists.techfak.uni-bielefeld.de"
>> <systemsafety at lists.techfak.uni-bielefeld.de>
>> Subject: Re: [SystemSafety] Bursting the formal methods bubble
>>
>> jean-louis,
>>
>>> I thinks that we have more example of mistake in classical development
>>> that
>>> in formal development
>>
>> This is because there is so little formal development, we
>> don't have enough data to reliably say one way or the other.
>>
>>> yes all method are applied by human and error appear but with formal
>>> methods you have a tool to detect it.
>>
>> There are tools to detect mistakes in code (not that these are widely
>> used :-(
>>
>>> with ADA, C .... C++ or JAVA and C# its more difficult to develop a
>>> software with a very low level of bug ...
>>
>> If you do multiple implementations, e.g., Ada+formal or
>> Ada+Visual basic, you will detect more problems and the result will
>> be more reliable software.
>>
>> The important question is bugs (or lack of) per buck.
>> Which technique is the most cost effective?
>>
> 

-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: default.xml
Type: application/xml
Size: 3222 bytes
Desc: default.xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20171027/6aaeed88/attachment-0007.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: default[1].xml
Type: application/xml
Size: 3222 bytes
Desc: default[1].xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20171027/6aaeed88/attachment-0008.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: default[2].xml
Type: application/xml
Size: 3222 bytes
Desc: default[2].xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20171027/6aaeed88/attachment-0009.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: default[3].xml
Type: application/xml
Size: 3222 bytes
Desc: default[3].xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20171027/6aaeed88/attachment-0010.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: default[4].xml
Type: application/xml
Size: 3222 bytes
Desc: default[4].xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20171027/6aaeed88/attachment-0011.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: default[5].xml
Type: application/xml
Size: 3222 bytes
Desc: default[5].xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20171027/6aaeed88/attachment-0012.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: default[6].xml
Type: application/xml
Size: 3222 bytes
Desc: default[6].xml
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20171027/6aaeed88/attachment-0013.xml>