Question

I would like to scrape search results from http://maxdelivery.com, but unfortunately, they are using POST instead of GET for their search form. I found this description of how to use Nokogiri and RestClient to fake a post form submission, but it's not returning any results for me: http://ruby.bastardsbook.com/chapters/web-crawling/

I've worked with Nokogiri before, but not for the results of a POST form submission.

Here's my code right now, only slightly modified from the example at the link above:

class MaxDeliverySearch

  REQUEST_URL = "http://www.maxdelivery.com/nkz/exec/Search/Display"

  def initialize(search_term)
    @term = search_term
  end

  def search
    if page = RestClient.post(REQUEST_URL, {
        'searchCategory'=>'*',
        'searchString'=>@term,
        'x'=>'0',
        'y'=>'0'
      })
      puts "Success finding search term: #{@term}"

      File.open("temp/Display-#{@term}.html", 'w'){|f| f.write page.body}

      npage = Nokogiri::HTML(page)
      rows = npage.css('table tr')
      puts "#{rows.length} rows"

      rows.each do |row|
        puts row.css('td').map{|td| td.text}.join(', ')
      end
    end
  end

end

Now (ignoring the formatting stuff), I would expect if page = RestClient.post(REQUEST_URL, {...} to return some search results if passed a 'good' search term, but each time I just get the search results page back with no actual results, as if I had pasted the URL into the browser.

Anyone have any idea what I'm missing? Or, just how to get back the results I'm looking for with another gem?

With the class above, I would like to be able to do:

s = MaxDeliverySearch.new("ham")
s.search #=> big block of search results objects to traverse
Was it helpful?

Solution

Mechanize is what you should use to automate a web search form. This should get you started using Mechanize.

require 'mechanize'

agent = Mechanize.new
page = agent.get('http://maxdelivery.com')

form = page.form('SearchForm')
form.searchString = "ham"
page = agent.submit(form)

page.search("div.searchResultItem").each do |item|
  puts item.search(".searchName i").text.strip
end
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top