Question

I'm trying to piece together a general picture of the state of formal software verification, and I'm having a good bit of trouble. For context, I come from mostly a math background. I'm quite familiar with the state of automated theorem provers and proof assistants with respect to their use to prove well formed mathematical statements (e.g. in Coq, Isabelle, Lean, etc.). What I'm having trouble understanding is what is going on with "formal methods" in practical CS applications.

I've found that companies such as Microsoft and AWS have used TLA+ as a "formal method" in their software development somewhat substantially. But when I started to read Wayne's Practical TLA+ book, I found that he considered a program to be formally checked if we just check say a sorting algorithm on lists of length $<n$ with entries between 1 and $n$ for some fixed $n$, i.e. we are just checking finitely many cases and saying that therefore the algorithm should work in general. This doesn't seem particularly interesting; just an example of particularly rigorous unit testing. Notably it is not a formal proof of correctness.

On the other hand, I've seen murmurs about Isabelle and Coq being able to prove things about software just like they can prove mathematical theorems. Though when I look into the books which seem to promise this, e.g. Chlipala's Certified Programming with Dependent Types, I see a lot of abstract stuff that seems to vaguely relate to formally verifying programs, but no examples of taking a real program written in a language that has widespread use (e.g. C++, Python, Java, etc.) or even just pseudocode and "verifying it", whatever that means.

Can someone try to clear up my confusion?

Was it helpful?

Solution

A formally proven program is a formally proven program regardless of which language it's in. Just because a program is written in Coq and perhaps extracted to OCaml or Haskell, rather than written in a more “enterprisy” language such as C++ or Java, doesn't make it any less of a program.

Proving programs written in general-purpose programming languages, even “tame” ones like Haskell, is hard because these languages typically include a lot of convenience features, dark corners for performance and to interface with the operating system, and rich and complex libraries. To prove a property of a program, you first need to state this property, and the statement embeds the semantics of the language that the program is written in. When you try to formalize languages that were initially designed without a formal semantics (which is almost all of them), you very quickly hit dark corners which the English description leaves unspecified, or where it's ambiguous, or where it's outright self-contradictory, or where the reference implementation doesn't do what the description says and that's considered a bug in the English rather than in the implementation. The state of the art of proving properties of programs written in a preexisting language is to restrict the programs to a subset of the language.

What goes into that subset is highly variable. Syntactic sugar is not too difficult: the semantics just needs to translate it into simpler constructs. Reflection properties aren't particularly difficult to model, but may make the model a lot more difficult to reason about (for example, it invalidates properties such as “this snippet of code doesn't have any way to reference this variable, therefore it cannot change its value”), so many frameworks rule this out. Interactions with the operating system (typically via libraries) are problematic because they require modeling the operating system, which get be extremely complex. Floating point operations are hard because keeping track of the approximations across successive operations causes a huge blow up.

For C, one of the main large subsets with a formal model is CompCert C, the language of CompCert. CompCert is a formally verified compiler (written in Coq), so if you prove a property of the C program, you can additionally get a proof of the generated machine code. CompCert C is a very large subset of C99, but the formalization excludes most of the standard library and some quirks of the language.

To prove a program that is written in a (suitable subset of a) real-world programming language, the program needs to be structured in such a way as to make the proof tractable. In practice, this means having written and proved the program first in a higher-level language (which has no implementation on real hardware), and using this higher-level language as a specification of the final program. Often there are several levels of successive refinements between the executable program and the specification.

It's fairly common that the final program is not written manually, but extracted mechanically from a higher-level language. For example, writing Coq which is extracted to OCaml, or writing F* which is extracted to C. But the opposite approach is possible as well, for example writing (“tame”) C, annotating it with properties of functions and other code subdivisions, and using Frama-C to prove those properties (and the implied property that the C program has no undefined behavior).

When you have a formal semantics of a programming language and a way to express properties of programs, proving these properties is a mathematical theorem. Typically these theorems don't involve any complex mathematics such as calculus (unless brought in by the application domain, such as tracking the movement of a physical object), but they're difficult to prove because involve very large formulae, and contain arithmetic statements ($x^n+y^n=z^n$ is not a complex equation, but solving it is not elementary!). It's theoretically impossible to write a program that can prove a non-trivial semantic property of all programs, and practically impossible to write a program that can prove many interesting properties of typical programs. Formal verification involves a combination of breaking the problem down into sufficiently small steps (writing small functions and stating enough precise properties of those functions), having a tool automatically prove some of those properties (such as a SAT solver for propositional logic), and having humans write proofs where the computer can't do it (but the computer will check the human's proof). Proof assistants such as Coq and Isabelle come in for this last step.

Formally proving a real-world program is a huge effort, requiring a much larger amount of time and expertise than the typical software project. It's rarely done outside of high-assurance environments, mostly environments with high safety requriements such as transportation (planes, trains, not so much cars), sometimes environments with high costs such as space or (very rarely) with high security requirements such as smart cards.

if we just check say a sorting algorithm on lists of length <n with entries between 1 and n for some fixed n, i.e. we are just checking finitely many cases and saying that therefore the algorithm should work in general.

That would not be a formal proof unless the program had a limit of n items for its input. I don't know this book, but I suspect either you misread something. Formally checking a sorting program would involve proving its correctness for all n.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top