[SystemSafety] Scalability of Formal Methods [was: Logic]

Wed Feb 26 15:13:54 CET 2014

Hello Peter,

Le 26/02/2014 12:30, Peter Bernard Ladkin a écrit :
> For example, David tells us that a couple hundred thousand lines of Ada source have been formally
> verified in 2006 using Atelier B. Now, if that was all SPARK source I would be prepared to believe
> it, but I'd want to know how many proof obligations were left undischarged.

Zero (0). Of course, a significant part (3.3% to 8.1% of proof 
obligations, i.e. 1,400 to 2,200) is done manually (i.e. interactive 
proofs) by a team of skilled engineers (internal or using subcontracts 
to companies specialized in such proofs).

B Method is based on refinement: you start from a formal specification, 
then you add details (algorithms, data structures) until you reach the 
B0 level which corresponds to a basic programming language (IF, WHILE, 
Boolean, integers, arrays, ...). At this level, you can directly 
translate it in Ada or C. At each refinement step, proof obligations are 
automatically generated to ensure the refined code corresponds to the 
more abstract one.

For the translation to Ada or C part, usually a double translator is 
used: two translators, developed by different teams with different 
technologies, one checks the result is the same[1].

> If it isn't all SPARK
> source, then I become more wary. Also, my experience tells me that, of any industrial system with
> reliability requirements, 90%-95% of the source code is boilerplate. Boilerplate is functionally
> simple and easy formally to verify; furthermore, it is repeated all over. So 95% of your functional
> code is going to be easy formally to verify, if you are organised and put in the effort. But that 5%
> - now that could really do you in! It's theorem-proving and the math might do you in, even with the
> best prover technology.

I agree. That's why, when applying formal methods:

  * You divide your system into formal and non-formal parts, in such a 
way that the formal part ensures your important system properties 
(safety, security, ...). The rule of the game is to have the smallest 
formal part. You can put a lot of complicated code in non-formal part 
(e.g. a radio transmission system) as long as your formal part checks it 
(e.g. no message is lost, messages are received in order, ...);

  * You design your software with proof in mind, e.g. using simpler 
algorithms and data structures. The resulting software might not be 
"optimal" but it does the job (e.g. functional and non-functional 
objectives are fulfilled);

  * The proof environment is tailored to the considered project to reach 
90-97% of automatic proof. For B Method, the prover is based on proof 
rules (about 1,400). The user can add new rules. For example, Siemens 
added about 3,100 rules to Atelier B prover for its projects. Of course, 
those rules might be incorrect, therefore they are proved themselves or 
carefully reviewed[2]. Old SPARK (before SPARK 2014) is based on the 
same approach if I'm correct;

  * Your organize your development in such a way that you prove code 
that is already mostly debugged, works, has been tested, has not changed 
for some time, etc. Proof is done last.

Sincerely yours,
david

[1] Somebody like Nancy Leveson could wonder if such double translator 
approach is really safe. I would say that such translators are rather 
basic (no complex algorithm) and the specification are rather clear, so 
we can have good arguments to say it is safe.

[2] Even after careful review by experts, some errors might slip in. 
Later studies with formal methods (e.g. Coq proof assistant or Zenon 
automated prover) have indeed found errors (e.g. 20 over 274 examined 
rules). But those errors where minor (correct rule but incorrect manual 
proof, or issues on non-freeness of rule variables) and never leaded to 
a proof obligation that was wrongly proved correct.