Isabelle: Why do I get completely different results when running try versus sledgehammer

StackOverflow https://stackoverflow.com/questions/21694248

  •  09-10-2022
  •  | 
  •  

Question

When I look only (this is not stated in the title) at the Sledgehammer results, the output of try often gives various results from sledgehammer while running sledgehammer directly gives zero(!) results.

Now I know, the design goal of Sledgehammer was not determinism, as stated by Lawrence Paulson. However, this seemingly random results without understanding why are a bit discouraging for beginners, not only for me but also for students helping in projects.

A possibility would be to always run Sledgehammer and try in parallel.

I don't need an exact answer, but does anybody have a clue why the results could be different? Are the two sledgehammer calls featuring different parameters?

Was it helpful?

Solution

First, I would recommend subscribing to the Isabelle user's mailing list to watch for information about these things. In particular, in the last six months, there were threads about Sledgehammer

  • how sledgehammer is related to try,
  • how the Auto methods in Isabelle / Options is related to try,
  • how multi-threading affects try and sledgehammer,
  • how options and provers are set for the Sledgehammer panel which can be used to run `sledgehammer,
  • how the number of CPU cores affect how the options are automatically set for sledgehammer,
  • and generally how provers are chosen for Sledgehammer in various ways, some automatic.

That's all related to your question, where one short answer to your questions is that two big things that affect sledgehammer output are the options and multi-threading, where you can only completely control the sledgehammer options by giving it options when you use it, or by setting the options using sledgehammer-params.

Below, I give some links to the mailing list, where M.Wenzel gave a short explanation about Sledgehammer's automatic settings yesterday.

Another answer to your second question is this: The try command is a multi-threaded combination of try0 and sledgehammer. There are the default sledgehammer options, which can be viewed with sledgehammer_param. If you haven't changed these options, then the command sledgehammer, when used in your theory, will use these options. Whether try is using these options, I don't know, but I'll show you how to determine whether try is giving sledgehammer a prover that's not listed in the default options.

I assume that try works for you like it works for me.

First, insert sledgehammer_params and look at the output panel. It will show you the provers. My default is e spass vampire z3, because I have 4 cores. These were chosen on startup by the PIDE. These are the same provers that will be shown in the Sledgehammer panel. A second important options will be timeout = 30.

When I insert try for a theorem which try0 methods can't prove, and which sledgehammer can, along with each by proof that's listed in the output panel, it will tell me the ATP (automatic theorem prover) which found the proof. For example, mine shows me "e", "z3", and "spass", each followed by a proof.

Answer: If try shows an ATP that's not listed when you use sledgehammer_params, then try is giving sledgehammer a different provers option than when you run sledgehammer with the default options.

Running a prover that try listed: To answer your question about whether you can get sledgehammer to give you the same sledgehammer proof that try gives you, run the individual provers that try listed, for example like this:

sledgehammer[provers = "z3", timeout = 60]

I show timeout just to show that you can change it, but you generally wouldn't need a longer timeout because try doesn't try things forever.

If your explicit use of sledgehammer gives you a z3 proof, when try has given you a z3 proof, then this tells you that it's the options and multi-threading that affect the different outcomes. I definitely know that the output of sledgehammer and nitpick vary greatly.

I don't use try. I use try0 and sledgehammer, but what you say shows that the automatically set options for the auto methods are better. Auto nitpick finds counterexamples that nitpick doesn't find because I've changed the options for nitpick.

Links

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top