Question

So I've been puzzling about this and just can't figure out how to fix this.

I've a nested function that loops through an array of objects and scrapes the social links of each object's URL.

After that, I'd like to update each object by including { social: [array of social urls] }. However, I get { social: [] } instead.

I've tried placing the contacts[i].social = results on other sections of this, but I get an "object not found" error.

Help would be much appreciated...

var contacts = [
  { title: 'Healthy breakfast: Quick, flexible options to grab at home - Mayo Clinic',
    url: 'http://www.mayoclinic.org/healthy-living/nutrition-and-healthy-eating/in-depth/food-and-nutrition/art-20048294' },
  { title: 'Healthy Breakfast Ideas from Dr. Weil\'s Facebook Readers',
    url: 'http://www.drweil.com/drw/u/ART03113/Healthy-Breakfast-Ideas-from-Facebook-Readers.html' },
  { title: '8 Healthy Breakfast Ideas - Prevention.com',
    url: 'http://www.prevention.com/food/healthy-eating-tips/8-healthy-breakfast-ideas' },
  { title: 'Quick & Easy Healthy Breakfast Ideas! - YouTube',
    url: 'http://www.youtube.com/watch?v=mD4YSD8LDiQ' },
  { title: 'The Best Foods to Eat for Breakfast - Health.com',
    url: 'http://www.health.com/health/gallery/0,,20676415,00.html' },
  { title: 'Healthy Breakfast Recipes - Secrets To Cooking Healthier - YouTube',
    url: 'http://www.youtube.com/watch?v=7jH0xe1XKxI' },
  { title: 'Healthy Breakfast Ideas You Can Make the Night Before - FitSugar',
    url: 'http://www.fitsugar.com/Healthy-Breakfast-Ideas-You-Can-Make-Night-Before-20048633' },
  { title: '10 Easy, 5-Minute Breakfast Ideas - Diet and ... - Everyday Health',
    url: 'http://www.everydayhealth.com/diet-and-nutrition-pictures/easy-5-minute-breakfast-ideas.aspx' },
  { title: 'Healthy Breakfast Ideas for Kids | Parenting - Parenting.com',
    url: 'http://www.parenting.com/gallery/healthy-breakfast-ideas-kids' },
  { title: 'Fruits & Veggies More Matters : Healthy Breakfast Ideas : Health ...',
    url: 'http://www.fruitsandveggiesmorematters.org/healthy-breakfast-ideas' } 
]; 

scraper(contacts); 

// loops through contacts database and scrapes requested information 
function scraper(contacts) {

    // Adds the domain of each contact 
    for(var i=0;i<contacts.length;i++){  
        contacts[i].domain = contacts[i].url.split(/\//, 3).join().replace(/,/g, '/');
    }; 

    //
    for(var i=0;i<contacts.length;i++){ 
        var homepage = contacts[i].domain;

        var results = [];

        function socialScrape(homepage) { 
            request(homepage, function(err, resp, html) {
                var $ = cheerio.load(html);

                if(!err && resp.statusCode == 200) {    
                    $('a').each(function(i, el){
                        var a = $(el).attr('href'); 
                        for(var key in socialURLS){
                            if(socialURLS[key].test(a) && results.indexOf(a) < 0){
                                results.push(a); 
                            }
                        }

                    });
                } else { console.log(err); } 
            })
        }
        contacts[i].social = results; 
        socialScrape(homepage); 
    }

console.log(contacts); 

} 
Was it helpful?

Solution

Your first issue is that your request call is asynchronous and hasn't yet returned by the time contacts[i].social = results is executed, so contacts[i].results is getting assigned an empty array, []. (Variations of this issue are posted on SO multiple times every day, a good explanation of the problem can be found here: How do I return the response from an asynchronous call?) The solution to this is not as simple as just moving contacts[i].social = results; into inside the request call success handler because the value of i will have changed before the handler is called.

Your second issue is that results is defined outside of the socialScrape function definition - so instead of having an array of items per request call, you have one array with all request results. The best way to resolve your scoping issues is with a closure, which we can achieve by removing the call to socialScrape(homepage); and making socialScrape a self-invoking function:

(function socialScrape(homepage) { 
    var results = [];
    var index = i;
    request(homepage, function(err, resp, html) {
        /* do error and status check stuff and build our results array */
        contacts[index].social = results;
    });
}(homepage));

Notice how we capture the current value of i within the closure and assign it to index. This will allow us to get the correct contact by index when our result is delivered.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top