سؤال

I'm trying to scrape one page and get 1 link to a file to download. The problem is that download link works only if server recognize referer of the source of file.

I tried to download it with Curl on a php script with referer setted, but it didn't work.

So, I tried with PhantomJS that works like a browser but I can't find the link where I need to click on it. I tried to set manually as target url the download link and as referer the origin url, but I still have error.

I login to that site with this code:

var pageLogin = require('webpage').create(),
server = 'http://domain.com/login.php',
data = 'redirect=index.php&login_username=username&login_password=password&';

pageLogin.open(server, 'post', data, function (status) {
if (status !== 'success') {
    console.log('Unable to post!');
} else {
    console.log(pageLogin.content);
}

});

I saved cookie and it works.

Now, I need to load another page, for example:

var pageRelease = require('webpage').create(),
serverRelease = 'http://domain.com/page.php?t=112';

and in this page I need to extract this link:

<tr class="row1">
    <td width="15%">aaa:</td>
    <td width="70%">aaa &nbsp;<span title="aaa">[ 13-may-14 15:15 ]</span></td>
    <td width="15%" rowspan="4" class="tCenter pad_6">
                    <p><a href="dl.php?t=112" class="dl-stub"><img src="http://domain.com/templates/default/images/attach_big.gif" /></a></p>
        <p><a href="dl.php?t=112" class="dl-stub dl-link">drink.txt</a></p>
                    <p class="small">5KB</p>
        <p style="padding-top: 6px;"><input id="gir-filelist-btn" type="button" class="lite" style="width: 120px;" value="download" /></p>      </td>
</tr>

My problem is that I can't find the href:

    <a href="dl.php?t=112" class="dl-stub dl-link">drink.txt</a>

I tried with this function but it didn't work:

var results = page.evaluate(function() {
        var allParas = document.getElementsByClassName("dl-stub");


        var num = allParas.length;
        var title = new Array();

        for(var i=0; i < num; i++) {
          title[i] = allParas[i].childNodes[1].childNodes[0].InnerHTML;
        }

        return title;
    });

    for(var i=0; i < results.length; i++) {
      console.log(results[i]) + "\n";
    }

What can I do?

Any suggestion?

Thanks

هل كانت مفيدة؟

المحلول

You can get the href attribute directly from the elements you've selected. You weren't far off.

var results = page.evaluate(function() {
    var allParas = document.getElementsByClassName("dl-stub");

    var num = allParas.length;
    var title = new Array();

    for(var i=0; i < num; i++) {
        title[i] = allParas[i].href;
    }

    return title;
});

for(var i=0; i < results.length; i++) {
    console.log(results[i]) + "\n";
}
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top