Question

Hi I am trying to scrape a web page "take the links" go to that links and "to scrape it" too.

require 'rubygems'
require 'scrapi'
require 'uri'

Scraper::Base.parser :html_parser

web = "http://......"

def sub_web(linksubweb)

  uri = URI.parse(URI.encode(linksubweb))

end

scraper = Scraper.define do

   array :items

   process "div.mozaique>div", :items  => Scraper.define {

       process "p>a", :title => :text
       process "div.thumb>a", :link => "@href"

       result :title, :link, 
     }
    result :items
end


  uri = URI.parse(URI.encode(web))

  scraper.scrape(uri).each do |pag|

    link_full = uri + pag.link.to_str
    puts pag.title
    sub_web(link_full)
    puts
  end

And I have the following error

e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) /Users/sss/web/app/views/admin/topics/webconector.rb
Title 1
http://mydomain/user34/top5

/Users/sss/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:304:in `escape': undefined method `gsub' for #<URI::HTTP:0x007fa07cb01e08> (NoMethodError)
    from /Users/sss/.rvm/rubies/ruby-1.9.3-p448/lib/ruby/1.9.1/uri/common.rb:623:in `escape'
    from ../app/views/admin/topics/conectaweb.rb:11:in `sub_web'
    from ../app/views/admin/topics/conectaweb.rb:34:in `block in <top (required)>'
    from ../views/admin/topics/conectaweb.rb:29:in `each'
    from ../app/views/admin/topics/conectaweb.rb:29:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

Process finished with exit code 1
Était-ce utile?

La solution

try using uri = URI.parse(URI.encode(linksubweb.to_s)) this should work. The problem is that method requires a string argument so you have to first convert the URI::HTTP object into string.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top