Domanda

I am trying to parse a list of image URL's and get some basic information before I actually commit to download.

  1. Is the image there (solved with response.code?)
  2. Do I have the image already (want to look at type and size?)

My script will check a large list every day (about 1300 rows) and each row has 30-40 image URLs. My @photo_urls variable allows me to keep track of what I have downloaded already. I would really like to be able to use that later as a hash (instead of an array in my example code) to interate through later and do the actual downloading.

Right now my problem (besides being a Ruby newbie) is that Net::HTTP::Pipeline only accepts an array of Net::HTTPRequest objects. The documentation for net-http-pipeline indicates that response objects will come back in the same order as the corresponding request objects that went in. The problem is that I have no way to correlate the request to the response other than that order. However, I don't know how to get relative ordinal position inside a block. I assume I could just have a counter variable but how would I access a hash by ordinal position?

          Net::HTTP.start uri.host do |http|
            # Init HTTP requests hash
            requests = {}
            photo_urls.each do |photo_url|          
              # make sure we don't process the same image again.
              hashed = Digest::SHA1.hexdigest(photo_url)         
              next if @photo_urls.include? hashed
              @photo_urls << hashed
              # change user agent and store in hash
              my_uri = URI.parse(photo_url)
              request = Net::HTTP::Head.new(my_uri.path)
              request.initialize_http_header({"User-Agent" => "My Downloader"})
              requests[hashed] = request
            end
            # process requests (send array of values - ie. requests) in a pipeline.
            http.pipeline requests.values do |response|
              if response.code=="200"
                  # anyway to reference the hash here so I can decide whether
                  # I want to do anything later?
              end
            end                
          end

Finally, if there is an easier way of doing this, please feel free to offer any suggestions.

Thanks!

È stato utile?

Soluzione

Make requests an array instead of a hash and pop off the requests as the responses come in:

Net::HTTP.start uri.host do |http|
  # Init HTTP requests array
  requests = []
  photo_urls.each do |photo_url|          
    # make sure we don't process the same image again.
    hashed = Digest::SHA1.hexdigest(photo_url)         
    next if @photo_urls.include? hashed
    @photo_urls << hashed

    # change user agent and store in hash
    my_uri = URI.parse(photo_url)
    request = Net::HTTP::Head.new(my_uri.path)
    request.initialize_http_header({"User-Agent" => "My Downloader"})
    requests << request
  end

  # process requests (send array of values - ie. requests) in a pipeline.
  http.pipeline requests.dup do |response|
    request = requests.shift

    if response.code=="200"
      # Do whatever checking with request
    end
  end                
end
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top