Question

Is it possible to scrape the products from a ecommerce site using the anemone and nokogiri libs in ruby?

I understand how to pull the data I need from each product page using nokogiri but I can't figure out how to make anemone/nokogiri crawl the site and grab all the product pages.

A push in the right direction would be much appreciated

Was it helpful?

Solution

I figured out my issues. First was that anemone didn't seem to be crawling all the pages. This was because the pages I wanted were under a subdomain that I had to tell anemone to crawl separately from the main domain. Second was I needed a way to determine which pages were actually product pages (and thus neede to be parsed). I did this by parsing one of the fields I wanted (sku number) and then testing if it was a sku with RegEX.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top