The set of tools/libraries for web-scraping really depends on the multiple factors: purpose, complexity of the page(s) you want to crawl, speed, limitations etc.
Here's a list of tools that are popular in a web-scraping world in Python nowadays:
There are also HTML
parsers out there, these are the most popular:
Scrapy
is probably the best thing that happened to be created for web-scraping in Python. It's really a web-scraping framework that makes it easy and straightforward, Scrapy
provides everything you can imagine for a web-crawling.
Note: if there is a lot AJAX and js stuff involved in loading, forming the page you would need a real browser to deal with it. This is where selenium helps - it utilizes a real browser allowing you to interact with it by the help of a WebDriver
.
Also see:
- Web scraping with Python
- Headless Selenium Testing with Python and PhantomJS
- HTML Scraping
- Python web scraping resource
- Parsing HTML using Python
Hope that helps.