[SystemSafety] Scalability of Formal Methods [was: Logic]

Wed Feb 26 12:30:20 CET 2014

On 2014-02-18 18:14 , Michael Jackson wrote:
> Perhaps the discussion about scalability needs to be tightened up a little.

I think the issue of what kinds of, and how large, systems are amenable to what sort of
validation/verification is very important.

First up are the meanings of validation and verification. Again. I think I first noticed a problem a
quarter century ago when I started working with telecommunications people, who used the words in the
opposite sense from the computer scientists I'd been working with.

When I talked about "program verification" or "software verification" I meant proving formally and
rigorously that SW fulfilled its specification. (I did of course presume that SW would have a
specification. It was 1992 and I was used to SE research, not industry practice. I have since learnt
to relax that assumption.)

When I talked about "software validation" I meant checking whether the SW in operation fulfilled its
expectations. Those expectations related to a requirements specification but they were also
informal. You don't want the wheels to fall off your car, but it's not necessarily in the
requirements specification that they should stay on. Many things are often not said (see
"completeness"). You validate through tests, through exercising the kit.

However, the people I was talking to at a company then called Verilog (Verilog is now the name of an
HDL, standardised in IEEE 1394 if I remember right) used the terms with exactly the switched
meanings. Since then, I have encountered this regularly. Alvery just sent me a set of slides from an
Airbus presentation in which, in terms of the V-model, "validation" is what you do up and down the
left part of the "V", and "verification" is what you do horizontally across. Switched from the way
I, and most computer scientists, use the terms, where "verification" goes up and down on the left,
and "validation" goes across.

So, let's have a look in the Electropedia www.electropedia.org aka IEC 60050. There are just four
definitions which use the term "verification".

191-14-13  (191 is Dependability and quality of service)
> maintainability verification	
> 		 a procedure applied for the purpose of determining whether the requirements for maintainability performance measures for an item have been achieved or not
> Note – The procedure may range from analysis of appropriate data to a maintainability demonstration.
> 

394-33-17 (394 is Nuclear instrumentation)
> 	software verification	
> 		 confirmation by examination and provision of objective evidence that the results of an activity meet the objectives and requirements defined for this activity

394-22-18
> 
> 	design verification	
> 		 process of determining wether [sic] or not the implemented system corresponds to the design

394-40-41
> 	verification	
> 		 process of ensuring that an equipment fulfils the specified conditions

311-01-13 (311 Electrical and electronic measurement)
> verification (of calibration)	
> 		 set of operations which is used to check whether the indications, under specified conditions, correspond with a given set of known measurands within the limits of a predetermined calibration diagram
> Note 1 – This term is used in the "uncertainty" approach.
> 
> Note 2 – The known uncertainty of the measurand used for verification will generally be negligible with respect to the uncertainty assigned to the instrument in the calibration diagram.

You can forget its use in 191. BTW, "dependability" to the IEC does not mean the same as
"dependability" to IFIP. In the IEC it does not include safety or security; to IFIP it most
certainly does.

(A side comment about this: the Chairman of IEC TC 56, responsible for Part 191, once told me I had
"nothing to contribute" to certain work of his TC because I am "a safety expert, and TC 56 does not
deal with safety". I pushed this a bit higher, which earned me an official complaint, and TC 56 then
"clarified" that they do consider safety aspects of their norms. Which may be a good thing since
they are responsible for FMEA, HAZOP, FT, ET and RCA standards. But it concerns me that few to no
safety experts sit on these committees formulating international standards which are directly
relevant to system safety practices. And I think it should concern others also.)

It is interesting that the specifically software-related meanings come in the nuclear sector. Why is
this? I have my guesses but would rather hear from someone who knows.

I think the instrumentation meaning of "verification" is also not directly applicable to SW systems.

Note that the three 394 meanings of "verification" allow either interpretation: "down the left V" or
"across the V".

That's what the IEC says. Room for improvement, as with many IEC definitions.

To me, the activity (verb) "verify" has two arguments. You verify A against B. You verify a design
specification against a requirements specification. You verify a design against a design
specification. You verify an implementation (SW source code) against a design. You verify object
code against (the meaning of) source code (if this source code is unambiguous). And so on. A
verification succeeds if and only if B is shown to be a refinement of A. "Refinement" is a technical
term accessible to those who have studied formal description languages :-) You may verify anything
in the left part of the "V" against anything higher up in that part, and the meaning is always the
same: a refinement. Under certain caveats; in a certain environment; within a certain
input/parameter-value domain.

To me, the activity "validation" almost always means some sort of running the kit to see what it
does. But it could include formal modelling, and sampling the state space (often far too large for
exceptionless inspection) to ensure that properties "mostly" hold.

I am sure you have your own, well-rehearsed answers to your five questions and I hope mine are not
incompatible :-)

> 1.  Questioning 'scalability' assumes, I think that work 'on a larger scale' is in some respects
> different from work 'on a smaller scale'. Are there such differences, and, if so, what are they?

One is the super-linear increase in most of the targets of effort (= engineer per time unit).
Another is the effective upper bound on the effort at any one moment in time.

The upper bound increases by jumps, as technology becomes available which was not available before
(e.g., Ed Clarke's "10^(20)" paper).

The "does not scale" assertion is really a claim that the current effective upper bound is lower
than that required for the majority of target applications, whatever those may be. Here arises a
clear need to be more precise.

For example, David tells us that a couple hundred thousand lines of Ada source have been formally
verified in 2006 using Atelier B. Now, if that was all SPARK source I would be prepared to believe
it, but I'd want to know how many proof obligations were left undischarged. If it isn't all SPARK
source, then I become more wary. Also, my experience tells me that, of any industrial system with
reliability requirements, 90%-95% of the source code is boilerplate. Boilerplate is functionally
simple and easy formally to verify; furthermore, it is repeated all over. So 95% of your functional
code is going to be easy formally to verify, if you are organised and put in the effort. But that 5%
- now that could really do you in! It's theorem-proving and the math might do you in, even with the
best prover technology. Rushby and von Henke's verification (well, partial verification because some
of the algorithms were incomplete) of clock sync algorithms in PVS is a very-well-cited major paper
in IEEE TSE.

My experience also tells me that provers (the people, not the tool) are a mixture of nature and
nurture. Virtuosi/virtuose. I spent weeks on some hand proofs in 1995 that Mark Saaltink rattled off
in half a day using the EVES prover. And one that he couldn't. Then that was my half-day to discover
that the assertion was wrong, fix it, and have him verify it in ten minutes. A decade earlier I
convinced Mark Stickel briefly that the usual axioms for S4.3 and S5 must obviously be part of a
terminating Knuth-Bendix reduction rule set. He spent a whole afternoon patiently manipulating his
K-B prover to show me that his intuition that it was unlikely was more accurate than mine. BTW, not
all provers are called Mark :-)

> 2.  Are we discussing the applicability of particular formal methods such as X, Y and Z, as
> described in their respective books and papers? Or the applicability of disciplined formal logic and
> mathematics generally?

Both, I think. I agree that the considerations are different.

> 3.  Is the question whether the putatively applicable method or discipline is locally and partially
> applicable (as, for example, arithmetic is locally and partially applicable to architecture), or is
> globally applicable (as, for example, COBOL was at one time claimed to be globally applicable to the
> whole of a data processing system)?

I think almost any successful application of FM is local-and-partial rather than global.

> 4.  Self-evidently we expect logic to save us from logic errors in development. Are there other
> kinds of error? If so, what saves us from them, and how is it related to the use of logic?

There are run-time errors. SPARK saves us from those. :-)

I also wouldn't think that all misfits between specifications and their refinements are inevitably
logic errors. I am not sure that I would categorise out-of-bounds and thereby mishandled input as a
*logic* error.

> 5.  Is it true that formal expression of any requirement, design or other development artifact is
> superior to informal expression? If not, how are informal and formal expression related in a
> development?

I think yes-and-no. Superior in various ways, for example lack of ambiguity, efficacy of
consistency-checking, suitability for formal manipulation of most sorts. And inferior in others, for
example intuitive understanding; a reason why natural-language specifications are legally required
for technical systems in certain areas (German railways, for example).

PBL

Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de