Question

I am trying to get https using Mechanize but failed with:

use strict;
use warnings;

use IO::Socket::SSL;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new;
$mech->proxy(['https','http'], 'http://proxy:8080/');
$mech->get('https://www.google.com');

print $mech->content;

Error:

Error GETing https://www.google.com: Bad Request at perl4.pl line 9.

When I try to use LWP::UserAgent I am able to get https without any error:

use LWP::UserAgent;

my $ua  = LWP::UserAgent->new;
$ua->proxy(['https','http'], 'http://proxy:8080/');
$ua->get('https://www.google.com');

Can anyone help out with this?

Currently using Mechanize 1.72

Was it helpful?

Solution 3

I have installed LWP-Protocol-connect-6.03 and connect to the proxy with

$https_proxy = 'connect://proxy:8080/';

It is working fine now :D

OTHER TIPS

WWWW::Mechanize is based on LWP::UserAgent, which for years has a strange idea of https proxy requests, e.g. instead of using a CONNECT request to build a tunnel and then upgrade to SSL it sends a GET request with a https-URL. See https://rt.cpan.org/Ticket/Display.html?id=1894

A fix is finally merged into libwwww-perl github repository, but I don't know when a new version of LWP will be released. In the meantime you might use Net::SSLGlue::LWP which monkey patches LWP to provide proper support for https proxy (I'm the author of Net::SSLGlue::LWP and of the fixes to LWP).

I would guess, based on the error you provided that your proxy is blocking a certain User-Agent. The HTTP user agent used by LWP::UserAgent is different from that of WWW::Mechanize.

I suggest to try to use this line:

my $mech = WWW::Mechanize->new( agent => 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36' );

This will make the proxy, and the receiving server to believe you are a Chrome browser, rather than some sort of crawler/malware/virus/etc

Another suggestion is to to do a data dumper on the $mech element and confirm what is "inside":

use Data::Dumper;
print Dumper($mech);

You can also use this same method to dump the content of $mech after the get() function call.

Not sure it is relevant but note, not all Proxies support HTTPS/SSL, only those that allow inline proxying/CONNECT proxying will allow you to proxy HTTPS/SSL traffic.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top