[SystemSafety] systemsafety Digest, Vol 34, Issue 16

Fri May 8 16:08:37 CEST 2015

On 08/05/2015 11:00, systemsafety-request at lists.techfak.uni-bielefeld.de 
wrote:
> On my calculations, your results imply there were 1896 conditions that
> had to be assessed / proved manually.
>
> We have done this type of manual review of tool output - though not for
> SPARK - and it takes quite a while to review, assess and document (i.e.
> measured in man-days).
>
> How long did this assessment/proof stage take in your case?
The actual numbers (in the ITP conference paper) are:

Total VCs: 152927

Proved automatically (no user-supplied lemmas): 151026

Proved automatically (with user-supplied lemmas): 1701

Proved by "review" (i.e. human): 200

It's hard to break down the effort required. A user-defined lemma
(essential an inference rule which is added to the theorem prover)
is developed, reviewed for soundness (or proved sound using
a general-purpose interactive prover like our own Checker,
Coq, or Isabelle), added to the prover, but then might get used to
prove several VCs, so the effort to develop a "good" lemma might
pay-off in automating the proof of hundreds of VCs.

Once you've got a good set of additional lemmas, those 1701
VCs are re-proved automatically, so there's no repeated effort.

The remaining 200 were reviewed by people and marked as "OK" - this
gets picked up by the tool as "proved by review."

The effort to do this is not recorded - it was done by the developers as 
part of
their normal development activity, not as a separate activity, so the time
to develop the lemmas and reviews was not recorded separately.

The proof lives on - it has been maintained and is reproduced daily
for all subsequent changes and builds of the system.

  - Rod