[SystemSafety] 7% of mistakes in Coq proofs go undetected

Mon Dec 9 13:03:00 CET 2019

David,

>> In 7% of cases (6.82% to be exact), verification succeeded (averaged
>> over all mutations).
> 
> In fact, I find it a pretty good score, I would have expected it to be
> much higher. Formal proofs are relative to a formal specification. If

When I first started reading the paper, I was expecting the value
to be around 1 maybe 2%.  I think 7% is really high.

The implication is that the person writing the proofs can make mistakes,
as we all do, and they won't be caught 7% of the time.

The closest work to this, for other languages is this paper:
https://www2.dmst.aueb.gr/dds/pubs/conf/2012-PLATEAU-Fuzzer/pub/html/fuzzer.html
The numbers in Figure 1 are very different, and I not sure how to
compare them against the Coq fuzzing work.

> this specification is incorrect, the proof will prove the incorrect
> thing. We all know that this is the weak point of formal software.

Many formal methods people promote the technique as proving correctness.
They really need to add a footnote to proofs of correctness, saying:
Proof is subject to 7% uncertainty due to undetected mistakes.

> Various approaches are used to have correct software like adequate
> testing, thorough review of specifications, etc.

I agree, formal methods is one technique in the tool bag.

> The 7% (good) score is unfortunately related to the fragility of the
> formal proofs. A slight change can break the proofs. Proof engineering
> has much more to learn from software engineering in that regard.

I wish software engineering had something to teach about structuring
code for robustness.  There is little evidence that anything works.
This is mainly due to the lack of experiments, rather than experiments
showing negative results.

-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com