Domanda

I was trying to do something interesting like:

http = Net::HTTP.new("t66y.com", 80)
request = Net::HTTP::Get.new("http://t66y.com/")
response = http.request(request)
puts response.inspect

it works fine, and give me <Net::HTTPOK 200 OK readbody=true>. However, after I changed url to something like http://t66y.com/thread0806.php?fid=16, it keep throwing EOFError exception to me. The whole log was:

/Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file reached (EOFError)
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:92:in `read'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2779:in `ensure in read_chunked'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2779:in `read_chunked'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2750:in `read_body_0'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2710:in `read_body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2735:in `body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2672:in `reading_body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1321:in `block in transport_request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1293:in `request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1286:in `block in request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:745:in `start'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1284:in `request'
    from /Users/lei/workspace/Dadiaosi/scraper.rb:18:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

do you guys have any clue about that?

È stato utile?

Soluzione

These work:

In the terminal:

$ curl -v http://t66y.com/thread0806.php?fid=16

In ruby:

require 'open-uri'
response = open("http://t66y.com/thread0806.php?fid=16")
html = response.read

From the curl response I can see the headers and that the content-length is missing and the charset is Chinese. This might be tripping up the ruby net http library if you're on an older version of ruby.

You can easily swap in open-uri to get the html as shown above.

Altri suggerimenti

It should be

uri = URI('http://t66y.com/thread0806.php?fid=16')
response = Net::HTTP.get(uri)
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top