Question

I made a script in ruby that uses mechanize. It goes to google.com, logs you in and the does an image search for cats. Next i want to select one of the results links from the page and then save the image.

My problem is that the links for all of the results are shown as empty strings so im not sure how to specify and click them.

here is the output of pp page so you can see the links im talking about. Note the first link are the suggested links, i can click those because they have a title "Past 24 hours" but the second link is an actual result from the search which i cannot click.

#<Mechanize::Page::Link
  "Past 24 hours"
  "/search?q=cats&hl=en&gbv=1&ie=UTF8&tbm=isch&source=lnt&tbs=qdr:d&sa=X&ei=T8kDUu7aB4f8iwKZx4HoBg&ved=0CCQQpwUoAQ">

#<Mechanize::Page::Link
""
"http://www.google.com/imgres?imgurl=http://jasonlefkowitz.net/wp-content/uploads/2013/07/Cute-Cats-cats-33440930-1280-800.jpg&imgrefurl=http://jasonlefkowitz.net/2013/07/slideshow-20-cats-that-suck-at-reducing-tensions-in-the-israeli-palestinian-conflict/&usg=__1YEuvKE4A9r6IIRkcz9Pu6ahN8Q=&h=800&w=1280&sz=433&hl=en&start=1&sig2=ekqjELPNQsK-QQ2r-4TeeQ&zoom=1&tbnid=Xz9P1WD4o4TSlM:&tbnh=94&tbnw=150&ei=b8sDUq36Ge3figLCzoBY&itbs=1&sa=X&ved=0CCwQrQMwAA">

Now here is a snip of the output of:

page.links.each do |link|
puts link.text.
end

Which will display the links on the page.

More
Large
Face
Photo
Clip art
Line drawing
Animated
Past 24 hours
Past week
Reset tools



















funny cats
cats and kittens
cats musical
cute cats
lots of cats
cats with guns
2
3
4
5
6
7
8
9
10
Next

Notice all the whitespace on the screen? That is where the empty name "" links are on the pp page output. Anyone have any ideas on how i can click one?

Here is the code to the script.

require 'mechanize'
agent = Mechanize.new
page = agent.get('https://google.com')
page = agent.page.link_with(:text => 'Sign in').click
# pp page
sign_in = page.form()       ##leave empty = nil
sign_in.Email = '10halec'
sign_in.Passwd = 'password'
page = agent.submit(sign_in)

page = agent.page.link_with(:text => 'Images').click
search = page.form('f')
search.q = 'cats'
page = agent.submit(search)

# pp page

# agent.page.image_with(:src => /imgres?/).fetch.save
page = agent.page.link_with(:text => '').click
# pp page

# page.links.each do |link|
#   puts link.text
# end
pp page

def save filename = nil
  filename = find_free_name filename
  save! filename
end
Was it helpful?

Solution

Notice all the whitespace on the screen? That is where the empty name "" links are on the pp page output. Anyone have any ideas on how i can click one?

page = agent.page.link_with(:text => '').click

That line works for me. I put both of the following html pages in my local apache server's htdocs directory(a publicly accessible directory):

page1.html:

<!DOCTYPE html>
<html>
  <head><title>Test</title></head>
  <body>
    <div><a href="/somesite.com/cat1.jpg">cat1</a></div>
    <div><a href="/page2.html"></a></div>
    <div><a href="/somesite.com/cat3.jpg"></a></div>
  </body>
</html>

page2.html:

<!DOCTYPE html>
<html>
  <head><title>Page2</title></head>
  <body>
    <div>hello</div>
  </body>
</html>

Then I started up my server, which meant that page1.html was accessible in my browser using the url:

http://localhost:8080/page1.html

Then I ran the ruby program:

require 'mechanize'

agent = Mechanize.new
agent.get('http://localhost:8080/page1.html')
pp agent.page

page = agent.page.link_with(:text => '').click
puts page.title 

...and the output was:

#<Mechanize::Page
 {url #<URI::HTTP:0x00000100c8dc18 URL:http://localhost:8080/page1.html>}
 {meta_refresh}
 {title "Test"}
 {iframes}
 {frames}
 {links
  #<Mechanize::Page::Link "cat1" "/somesite.com/cat1.jpg">
  #<Mechanize::Page::Link "" "/page2.html">
  #<Mechanize::Page::Link "" "/somesite.com/cat3.jpg">}
 {forms}>

Page2

The pp page output looks the same as your output, and I was successfully able to click on a link that has no text--as evidenced by the output Page2.

The only problem with that code is that that link_with() returns only the first match. If I use links_with(), I get all the matching links:

require 'mechanize'

agent = Mechanize.new
agent.get('http://localhost:8080/page1.html')

links = agent.page.links_with(:text => '')
p links

--output:--
[#<Mechanize::Page::Link "" "/page2.html">
, #<Mechanize::Page::Link "" "/somesite.com/cat3.jpg">
]

I would like to see the actual html of the links you are having problems with.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top