Question

I am using the Mechanize Ruby gem to scrape some content of epinions.com. But somehow, some links are not being interpreted right. This is caused by Mechanize replacing ~ with . The consequence is that Mechanize is not able to click the link.

Example of an unsuccessful, and then a successful scrape:

# script

agent = Mechanize.new

page_1 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-AtomicPark_com/display_~reviews")
puts page_1.links_with(:href => /full_specs/, :text => /^View Information$/).last.inspect

page_2 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-Vanns_com/display_~reviews")
puts page_2.links_with(:href => /full_specs/, :text => /^View Information$/).last.inspect

# result

#<Mechanize::Page::Link
 "View Information"
 "/webs-Web_Services-All-Merchants-AtomicPark_com/display_‾full_specs">
#<Mechanize::Page::Link
 "View Information"
 "/webs-Web_Services-All-Merchants-Vanns_com/display_~full_specs">

Any idea why this happens?

Was it helpful?

Solution

This works fine for me:

[14:29] arkham ~/Desktop [2.1.0]
↳ $ ruby mechanize.rb
#<Mechanize::Page::Link
 "View Information"
 "/webs-Web_Services-All-Merchants-AtomicPark_com/display_~full_specs">
#<Mechanize::Page::Link
 "View Information"
 "/webs-Web_Services-All-Merchants-Vanns_com/display_~full_specs">

Which version of ruby are you using?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top