Question

I am looking for recommendations for a screenscraper I need to extract "Contact Us" information from certain web sites.

Any ideas where I can get a good (pref free) screenscarper?

Was it helpful?

Solution

Write your own -- it isn't hard. if you aren't familiar with programming or have a choice for programming languages: use Python the library support for doing scraping great.

As for how to attack the problem their are two popular techniques: use regular expressions, they work best for ad-hoc screen scraping. If your target web-sites are well structured -- read: not ad-hoc -- then use a framework that allows you to work with the DOM.

Navigation and Extraction

These are the two phases of writing a spider. Your spider needs to navigate a website to visit different pages, and it needs to extract information of interest. Both these phases can be driven by either the DOM or RE's

p.s., Since your name indicates .NET -- I should mention that I have written scrapers in C-Sharp -- it's a doddle.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top