Question

I want to scrape some pages of this site: Marketbook.ca So I used for that mechanize. but it does not load pages properly. and it returns a page with empty body, like in the following code:

require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Linux Firefox'
agent.get('http://www.marketbook.ca/list/list.aspx?ETID=1&catid=1001&LP=MAT&units=imperial')

What could be the issue here?

Était-ce utile?

La solution

Actually this page requires JS engine to display the content:

<noscript>Please enable JavaScript to view the page content.</noscript>

Mechanize doesn't handle pages with JS, so you'd better choose another options like Selenium or WATIR. Both need a real web browser to manipulate.

Another option for you is to look through included JS scripts and figure out where data comes from and query that web resource if it's possible.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top