Question

on "twill" documentation page it is written:


By default, twill will run pages through tidy before processing them. This is on by default because the Python libraries that parse HTML are very bad at dealing with incorrect HTML, and will often return incorrect results on "real world" Web pages. To disable this feature, set config do_run_tidy 0


But where is this tidy program located inside twill? I have downloaded "twill 0.9" and looked into "twill" folder contents - I just can't find there such a file (or a module) that would be named "tidy"

Was it helpful?

Solution

twill uses the commandline version of tidy if installed on your system. the method that calls tidy to clean your code is located in the utils.py and named 'run_tidy'. its called by the command 'tidy_ok' which is defined in commands.py

if use_tidy is set to true (which it is by default) the _cleanup_html method in ConfigurableParsingFactory calls the run_tidy method

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top