Question

So I have this simple code to mine some videos url in order to apply another scraping function to it afterward. My problem is that I can't seem to return the url-filled array. I know that it's a problem of scope but I'm not that familiar with Javascript and my knowledge got me as far as I could.

Here is the code :

var request = require('request');
var cheerio = require('cheerio');

var startUrl = 'http://www.somewebsite.com/mostviewed';

var getVideoIds = function(url) {

    var urls = [];

    request(url, function(err, resp, body){
        if (err)
            throw err;
        $ = cheerio.load(body);


        var videoUrls = [];
        $('.videoTitle a').each(function() {
            videoUrls.push($(this).attr('href'));
        });
    });

   return urls;
}


var urlsToScrap = getVideoIds(startUrl);
console.log(urlsToScrap);

PS : the current code returns an empty array;

Was it helpful?

Solution

You have two issues. One is that you're returning urls but it's never set to anything. You are pushing values onto videoUrls but you're returning the empty urls array. The other is that request is an asynchronous function. You will need to set a callback to set the video urls once it brings the scraped data back.

So:

var urls = [];

request(url, function(err, resp, body){
    if (err)
        throw err;
    $ = cheerio.load(body);

    $('.videoTitle a').each(function() {
        urls.push($(this).attr('href'));
    });

    onVideosScraped();
});

function onVideosScraped() {
    console.log(urls);  
}

This should work, and is a rudimentary way to do it. You can of course wrap any of this you want in functions to make it more reusable, but I hope this answers your question.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top