[SystemSafety] Claims for formal methods

Thu Feb 20 12:11:32 CET 2014

David,

I think you are underselling somewhat.

On 2014-02-20 10:16 , David MENTRE wrote:
>> 3.    The support tools are assumed to be correct; primarily the Coq
>> proof assistant, which is written in OCaml.
> 
> Yes. But Coq is designed in such a way (a very small, manually verified trusted core and then
> libraries built upon that core) that the possibility to have an issue in Coq is very very low (but
> not zero, I agree).
> 
> Regarding OCaml, yes some bugs in the OCaml compiler or run-time could theoretically impact Coq. But
> having such bugs is very remote. 

It is important here not just to talk about "correct" and "bugs" indiscriminately but to explain its
and their nature. Of course, some understanding of logic is needed to make necessary distinctions,
and as we have been discussing this may not always be present.

A general-purpose theorem prover such as Coq has been well exercised. That is, its functionality as
a usable computer program has been well exercised. All of that functionality as a usable program is
peripheral to its functional use as a prover. If you start the thing up and it barfs, you restart
it; if it barfs continually, you go away and have a cup of coffee, admit defeat and do something
else. If it runs and runs and runs and runs, you do similarly. If it gives you nonsense output, you
do similarly.

A prover is asked to say whether A follows logically from B. It does so by finding a proof,
automatically, quasi-automatically, or with considerable human help. When it says "yes", it has
found what its internal rules (not just the explicitly programmed ones but the emergent ones) say is
a proof.

That proof can be submitted for inspection, either by people or by a separate piece of SW that
checks it.

The important correctness property of a prover and its associated activity is that (a) when it says
"yes", (b) it produces an object, and (c) that object can be subsequently checked to see that it is
indeed a proof. That is the property of soundness. This property is not wisely applied to the prover
alone, but to the entire three-step a-b-c process. The first two steps are transparent. The third
step involves soundness of the checker, which if it is SW may or may not be part of the prover package.

If the prover is a general-purpose prover, then the third step is straightforward and also
transparent. But many provers nowadays rely on so-called decision procedures, sort-of oracles which
take a specific mathematical structure and perform operations peculiar to that structure to come up
with a yes/no decision on whether a property holds of that structure. They vary considerably in how
hard it may be to verify whether a "yes" answer is correct (for a "no" answer, mostly they produce a
counterexample, so again this may be checked by hand or by machine and is usually not problematic).

For example, answers from Pressburger arithmetic are easy to check by hand.

However, the extensive combinatorics involved in the decision procedures for such proofs as that of
the four-color theorem or Hales's theorem are obscure. The decision procedure has searched through
practically-innumerable examples, using combinatorial-decomposition techniques, and has told you
what its findings are about all those examples. It just has to be wrong about one, somehow, and its
result is out the window. And when it's checking more than what any mathematician or group of
mathematicians could check in a lifetime, then there is a real question about how such a proof can
ever be accepted (human mathematicians are also not infallible). It is, though, questionable how
relevant such a phenomenon is to algorithm or program verification. I have never seen such a
questionable example arise in any formal algorithm/program verification I have ever worked with or
around.

Also, it is not easy to build a subtly-unsound prover. It is of course very easy to build an unsound
prover and when this prover is exercised its unsoundness is usually discovered moderately quickly.
But once you have something which most of the time carefully follows the inference mechanisms you
have built in and checked, it is not going to condone random inferences, and if it does do odd
things sometimes then you can discover that by inspection and hedge it.

The point is that routine verification is open to secondary checking. The key to understanding that
things amiss with a prover are benign is to understand what "correctness" means in this case and
that such correctness is more or less transparent. That's what makes such SW different from Excel.

PBL

Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de