Question

I am relatively new to Node.js and I am trying to get more familiar with it by writing a simple module. The module's purpose is take an id, scrape a website and return an array of dictionaries with the data.

The data on the website is scattered across pages whereas every page is accessed by a different index number in the URI. I've defined a function that takes the id and page_number, scrapes the website via http.request() for this page_number and on end event the data is passed to another function that applies some RegEx to get the data in a structured way.

In order for the module to have complete functionality, all the available page_nums of the website should be scraped.

Is it ok by Node.js style/philosophy to create a standard for() loop to call the scraping function for every page, aggregate the results of every return and then return them all in once from the exported function?

EDIT

I figured out a solution based on help from #node.js on freenode. You can find the working code at http://github.com/attheodo/katina_node

Thank you all for the comments.

Was it helpful?

Solution 3

With the helpful comments from #node.js on Freenode I managed to find a solution by sequentially calling the scraping function and attaching callbacks, as Node.js philosophy requires.

You can find the code here: https://github.com/attheodo/katina_node/blob/master/lib/katina.js

The code block of interest lies between lines 87 and 114.

Thank you all

OTHER TIPS

The common method, if you don't want to bother with one of the libraries mentioned by @ControlAltDel, is to to set a counter equal to the number of pages. As each page is processed (ansynchronously so you don't know in what order, nor do you care), you decrement the counter. When the counter is zero, you know you've processed all pages and can move on to the next part of the process.

The problem you will probably encounter is recombining all of the aggregated results. There are several libraries out there that can help, including Async and Step. Or you can use a promises library like Fibers.Promise. But the latter is not really node philosophy and requires direct code changes / additions to the node executable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top