[SystemSafety] Scalability of Formal Methods [was: Logic]

Thu Feb 27 16:39:40 CET 2014

I've always considered that the role of testing is fundamentally
different when software is developed using a correct-by-construction
approach that formally proves that the code satisfies the software
requirements.

This approach uses a chain of tools (formal verification tools,
compiler, linker, etc.) and hardware, each of which be believe produces
output that has a defined relationship to the input. For example, the
formal tool provides proof that the source code meets the requirements;
the compiler produces object code that faithfully implements the
semantics of the source code in the target instruction set; and the
hardware faithfully implements that instruction set. We further believe
that we have combined these tools and hardware in a process such that if
they all function correctly, the target system must meet the requirements.

The purpose of testing is to challenge those beliefs and ultimately
establish confidence in them. If even one test fails, then there is a
problem in one of the tools, or the hardware, or the process. So when a
test fails, we don't just treat it as a bug that needs fixing, we must
establish which tool or hardware or part of the process is at fault, and
correct it so that it can't happen again.

Of course, in practice some sorts of test may also expose deficiencies
in the original specification.

David Crocker, Escher Technologies Ltd.
http://www.eschertech.com
Tel. +44 (0)20 8144 3265 or +44 (0)7977 211486

On 27/02/2014 11:27, Patrick Graydon wrote:
> It has occurred to me that the ideal testing to complement proof that
> source code refines a specification might /not/ be the same mix of
> testing that we recommend where formal methods are not used. If, in
> these cases, we are mainly want testing so that we can find
> compiler-introduced defects and ways in which the real chips don’t
> behave as the instruction set manual says they should, perhaps testing
> to achieve coverage of the instruction set would find more bugs or
> more bugs per testing dollar than, say, achieving MC/DC. I don’t have
> the answer here; I don’t even know the complete list of types of
> defects I’d want the testing to find in these cases. It’s a bit of a
> cliche, but it might be that more research is needed here. Is that
> clearer?