I was trying to do something interesting like:

http = Net::HTTP.new("t66y.com", 80)
request = Net::HTTP::Get.new("http://t66y.com/")
response = http.request(request)
puts response.inspect

it works fine, and give me <Net::HTTPOK 200 OK readbody=true>. However, after I changed url to something like http://t66y.com/thread0806.php?fid=16, it keep throwing EOFError exception to me. The whole log was:

/Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file reached (EOFError)
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:92:in `read'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2779:in `ensure in read_chunked'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2779:in `read_chunked'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2750:in `read_body_0'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2710:in `read_body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2735:in `body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2672:in `reading_body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1321:in `block in transport_request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1293:in `request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1286:in `block in request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:745:in `start'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1284:in `request'
    from /Users/lei/workspace/Dadiaosi/scraper.rb:18:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

do you guys have any clue about that?

有帮助吗?

解决方案

These work:

In the terminal:

$ curl -v http://t66y.com/thread0806.php?fid=16

In ruby:

require 'open-uri'
response = open("http://t66y.com/thread0806.php?fid=16")
html = response.read

From the curl response I can see the headers and that the content-length is missing and the charset is Chinese. This might be tripping up the ruby net http library if you're on an older version of ruby.

You can easily swap in open-uri to get the html as shown above.

其他提示

It should be

uri = URI('http://t66y.com/thread0806.php?fid=16')
response = Net::HTTP.get(uri)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top