Do theorem provers demonstrate their own correctness?

https://cs.stackexchange.com/questions/119817

28-09-2020
|

Question

I am not very well-versed in the world of theorem proving, much less automated theorem proving, so please correct me if anything I say or assume in my question is wrong.

Basically my question is: are automated theorem provers themselves every formally proven to work with another theorem prover, or is there just an underlying assumption that any theorem prover was just implemented really really well, extensively tested & reviewed, etc. and so it "must work"?

If so, does there always remain some underlying doubt in any proof proven by a formally verified automated theorem prover as the formal verification of that theorem prover still lies on assuming that the non-formally verified theorem prover was correct in its verification of the former theorem prover, even if it might technically be wrong - as it was not formally verified itself? (That is a mouthful of a question, apologies.)

I am thinking of this question in much the same vein as bootstrapping compilers.

Solution

I recommend reading Pollack's How to believe a machine-checked proof. It explains how proof assistants are designed to minimize the amount of critical code.

There are many levels of formal verification (that's the phrase you're looking for in place of "proven") of a proof assistant:

Verify that the algorithms used by the proof assistant are correct.
Verify that the implementation of (the critical core of) the proof assistant is correct.
Verify that the compiler for the language in which the proof assistant is implemented is correctly designed and implemented.
Verify that the hardware on which the proof assistant runs is correctly designed and built.
Compute the probability that a cosmic ray passes through the CPU and tricks your proof assistant every time you run it.
Estimate the likelihood that you are insane.

People put serious effort into these (well, at least the first four). For example, steps 1 and 2 are addressed in Coq Coq Correct!, and steps 3 and 4 in the amazing award-winning CompCert project.

OTHER TIPS

What you need is the idea of "the trusted core". Quoting "A verified runtime for a verified theorem prover":

In many theorem provers, the trusted core—the code that must be right to ensure faithfulness—is quite small. As examples, HOL Light is an LCF-style system whose trusted core is 400 lines of Objective Caml, and Milawa is a Boyer-Moore style prover whose trusted core is 2,000 lines of Common Lisp. These cores are so simple we may be able to prove their faithfulness socially, or perhaps even mechanically as Harrison did for HOL Light.

Once you assure yourself the core is correct, it can be used to verify the correctness of some extensions, and then those extensions can be used...

While this may trend close to self-advertisement, this is essentially the topic of my recent paper Metamath Zero: The Cartesian Theorem Prover (video), and the analogy with bootstrapping compilers is spot on. The introduction of the paper lays out what is needed to make this happen, and it's only a problem of engineering.

As Andrej says, there are several components that go into a "full bootstrap", and while many of the parts have been done separately, the theorem provers that are used by the community are only correct in the sense that linux is correct: we've used it for a while and there are no bugs we have found so far, except the bugs that we did find and fix.

The issue, as ever, is that because the cost of producing verified software is high, verified programs tend to be simplistic or simplified from the "real thing", and so there remains a gap between what people use and what people prove theorems about. A "small trusted kernel" setup is necessary but not sufficient, because unless you have a full formal specification (with proofs) of the programming language, the untrusted part can still interfere with the trusted part, and even if the barrier is air-tight, you have communication problems when the untrusted part is in control - for example, the kernel may flag an error that the untrusted part ignores, or the kernel may never be shown some apparent assertion at the source level.

The good news is that projects of this scale have become feasible in the past few years, so I am very hopeful that we can get production quality theorem provers with verified backends soon-ish.

The CakeML project should get a mention here: they have a ML compiler verified in HOL4, that is capable of running a HOL verifier written in ML. (Unfortunately HOL4 is more than just its logical system, so there is some work to be done to make this realistic.)
The Milawa kernel is written in a subset of ACL2, with a bunch of bootstrapping stages before closing the loop (being able to prove theorems about this same ACL2 subset). This is the only actual theorem prover bootstrap I know, but it doesn't go down to machine code, it stays at the level of Lisp, and from what I understand it's not actually performant enough for production work. It has since been verified down to machine code, but that part of the proof was done in HOL4 so it's not actually bootstrapping at the machine code level.
Coq recently made some strides towards this with Coq Coq Correct!, but it doesn't cover the full Coq kernel (including recent additions such as SProp, and the module system). (Aside, if there are any Coq experts reading this: if you know any place where the complete formal system implemented by Coq is written down, I would really like to see it. Formalized is nice but informal might even be better, as long as it is complete and precise.) You also can't connect it to CompCert as Andrej suggested, because the typechecking algorithm is only described abstractly, and certainly not in C, which is what CompCert expects as input.
Metamath Zero is still under construction but the goal is to prove correctness down to the machine code level, inside its own logic. (I also can't help but mention it's currently about 8 orders of magnitude faster than "the other guys" but we'll see if that holds up until the end of the project.)

In case you are not aware, there is a possibility that a theorem prover is implemented 100% correctly and run on a faultless machine and proves itself arithmetically inconsistent, even though it is not. The other existing answers focus on the issue of verifying that the theorem prover runs exactly as it was designed to run, but do not address this aspect of "correctness".

More precisely, every theorem prover is based on some underlying foundational system $S$. For $S$ to be considered a reasonable foundational system, $S$ must interpret arithmetic, meaning that there is a computable translation $ι$ of sentences over PA to sentences in the language of $S$ such that for every arithmetical sentence $Q$ we have that if PA proves $Q$ then $S$ proves $ι(Q)$. Godel's incompleteness theorem can be easily extended to any such (computable) foundational system $S$, and states the following: $ \def\con{\text{Con}} $

If $S$ does not prove $ι(0=1)$, then $S$ does not prove $ι(\con(S))$.

Here $\con(S)$ is the arithmetical sentence that encodes the (arithmetical) consistency of $S$. Namely, $S$ does not prove $ι(0=1)$ iff $\mathbb{N} ⊨ \con(S)$.

So first of all, this implies that $S$ cannot possibly prove its own consistency unless it is inconsistent. In this sense it is impossible for any theorem prover to prove that itself is correct. It may be able to prove that its execution on a particular machine satisfies certain specifications (as mentioned in other answers), but that is far from proving correctness of the theorems that it proves. In other words, anyone who wants confidence in a theorem prover's outputs still must examine those specifications and be somehow convinced that they ensure correctness of the outputs. That part can never be covered by the theorem prover itself.

Secondly, if $S$ is deductively closed for arithmetical sentences, meaning that if $S$ proves $ι(Q)$ and $ι(Q⇒R)$ then $S$ also proves $ι(R)$, then let $A(S)$ be the arithmetical theorems of $S$, namely all arithmetical sentences $Q$ such that $S$ proves $ι(Q)$, and let $S'$ be a formal system that proves the same theorems of $S$ plus $ι(R)$ for every arithmetical sentence $R$ such that $A(S)+¬\con(S)$ proves $R$. Then $S'$ is still deductively closed for arithmetical sentences, but $S'$ proves $ι(¬\con(S))$ and hence $ι(¬\con(S'))$.

Note that $S'$ proves everything that $S$ does, and if $S$ is consistent then $S'$ is also consistent but proves that itself is inconsistent. Now how can we know that the foundational system we started with was not like such an $S'$? We cannot know, without assumptions beyond $S$. Simply put, it is possible that a theorem prover proves that it proves nonsense including $0=1$, even though it does not!

To summarize:

Any reasonable theorem prover cannot prove that itself produces only consistent outputs.
One cannot determine without assumptions beyond those coded into a theorem prover that it does not produce false theorems.

I also wish to say a bit about verification according to specifications. This is necessarily conditional; all you can prove is that, if the theorem prover is run on a machine that satisfies certain properties, then the theorem prover produces outputs that satisfy certain specifications. Can you know for sure whether the machine on which you run the theorem prover actually satisfies those properties? No. At some point, you are just going to have to trust or believe something that you cannot prove.

This depends on the underlying axioms of the theorem prover. If a prover is based on primitive set theory, it will have few axioms. It can also have optional axioms such as the axiom of choice, which may or may not be useful for higher level proofs. From these primitive axioms and theorems such as Peano arithmetic, we can build up the familiar algebra of ordinary and complex and floating point arithmetic. A further feature is the use of abtract types to represent theorems. If you have not proved a theorem, you cannot make use of it because that type will not exist yet. A further level of complexity is to write the core of the prover inside the prover itself. This may be used to give confidence that the prover written in itself satisfies the same proof traces as the prover running on a general purpose computer.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange