문제

I was trying to do something interesting like:

http = Net::HTTP.new("t66y.com", 80)
request = Net::HTTP::Get.new("http://t66y.com/")
response = http.request(request)
puts response.inspect

it works fine, and give me <Net::HTTPOK 200 OK readbody=true>. However, after I changed url to something like http://t66y.com/thread0806.php?fid=16, it keep throwing EOFError exception to me. The whole log was:

/Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file reached (EOFError)
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/protocol.rb:92:in `read'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2779:in `ensure in read_chunked'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2779:in `read_chunked'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2750:in `read_body_0'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2710:in `read_body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2735:in `body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:2672:in `reading_body'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1321:in `block in transport_request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1293:in `request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1286:in `block in request'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:745:in `start'
    from /Users/lei/.rvm/rubies/ruby-1.9.3-p362/lib/ruby/1.9.1/net/http.rb:1284:in `request'
    from /Users/lei/workspace/Dadiaosi/scraper.rb:18:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

do you guys have any clue about that?

도움이 되었습니까?

해결책

These work:

In the terminal:

$ curl -v http://t66y.com/thread0806.php?fid=16

In ruby:

require 'open-uri'
response = open("http://t66y.com/thread0806.php?fid=16")
html = response.read

From the curl response I can see the headers and that the content-length is missing and the charset is Chinese. This might be tripping up the ruby net http library if you're on an older version of ruby.

You can easily swap in open-uri to get the html as shown above.

다른 팁

It should be

uri = URI('http://t66y.com/thread0806.php?fid=16')
response = Net::HTTP.get(uri)
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top