Question

I am using WWW::Mechanize and currently handling HTTP responses with the 'Content-Encoding: gzip' header in my code by first checking the response headers and then using IO::Uncompress::Gunzip to get the uncompressed content.

However I would like to do this transparently so that WWW::Mechanize methods like form(), links() etc work on and parse the uncompressed content. Since WWW::Mechanize is a sub-class of LWP::UserAgent, I would prefer to use the LWP::UA::handlers to do this.

While I have been partly successful (I can print the uncompressed content for example), I am unable to do this transparently in a way that I can call

$mech->forms();

In summary: How do I "replace" the content inside the $mech object so that from that point onwards, all WWW::Mechanize methods work as if the Content-Encoding never happened?

I would appreciate your attention and help. Thanks

Was it helpful?

Solution

OTHER TIPS

It looks to me like you can replace it by using the $res->content( $bytes ) member.

By the way, I found this stuff by looking at the source of LWP::UserAgent, then HTTP::Response, then HTTP::Message.

It is built in with UserAgent and thus Mechanize. One MAJOR caveat to save you some hair

-To debug, make sure you check for error $@ after the call to decoded_content.

$html = $r->decoded_content;
die $@ if $@;

Better yet, look through the source of HTTP::Message and make sure all the support packages are there

In my case, decoded_content returned undef while content is raw binary, and I went on a wild goose chase. UserAgent will set the error flag on failure to decode, but Mechanize will just ignore it (It doesn't check or log the incidence as its own error/warning).

In my case $@ sez: "Can't find IO/HTML.pm .. It was eval'ed

After having to dive into the source, I find out the built-in decoding process is long, meticulous, and arduous, covering just about every scenario and making tons of guesses (Thank you Gisle!).

if you are paranoid, explicitly set the default header to be used with every request at new()

    $browser = new WWW::Mechanize('default_headers' => HTTP::Headers->new('Accept-Encoding' 
                            => scalar HTTP::Message::decodable()));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top