문제

I'm creating a web crawler using Mechanize for ruby. I'll be running batches of 200k at a time and I want to be able to set an instance variable that the site is not valid and move on with the next site when the get request return an error. For example I'm crawling a site that returns when an http get request is fired against it Error 101 (net::ERR_CONNECTION_RESET): The connection was reset. and my application crash.

  def crawl  
    agent = Mechanize.new
    agent.log = Logger.new('out.log')
    agent.user_agent_alias = 'Mac Safari'
    begin
      page = agent.get(@url)
    rescue Mechanize::ResponseCodeError => exception
      if exception.response_code == '400' or exception.response_code == '500'
        @isActive = false
        return
      end
    end
  end

Is there an exception I should catch so I can recover from ERR_CONNECTION_RESET or what's the approach that you guys used to do this?

도움이 되었습니까?

해결책

Why not catch everything?

begin
  page = agent.get(@url)
rescue
  @isActive = false
end
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top