Вопрос

I am trying to figure out how to get Make and Model out of XML returned from a URL and put them into a CSV. Here is the XML returned from the URL:

<VINResult xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://basicvalues.pentondata.com/">
  <Vehicles>
    <Vehicle>
      <ID>131497</ID>
      <Product>TRUCK</Product>
      <Year>1993</Year>
      <Make>Freightliner</Make>
      <Model>FLD12064T</Model>
      <Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
    </Vehicle>
    <Vehicle>
      <ID>131497</ID>
      <Product>TRUCK</Product>
      <Year>1993</Year>
      <Make>Freightliner</Make>
      <Model>FLD12064T</Model>
      <Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
    </Vehicle>
  </Vehicles>
  <Errors/>
  <InvalidVINMsg/>
</VINResult>

Here is the code I have so far:

require 'csv'
require 'rubygems'
require 'nokogiri'
require 'open-uri'

    vincarriercsv = 'vincarrier.csv'
    vindetails = 'vindetails.csv'
    vinurl =  'http://redacted/LookUp_VIN?key=redacted&vin='

    CSV.open(vindetails, "wb") do |details|
        CSV.foreach(vincarriercsv) do |row|
            vinxml = Nokogiri::HTML(vinurl + row[1])
                make = vinxml.xpath('//VINResult//Vehicles//Vehicle//Make').text
                model = vinxml.xpath('//VINResult//Vehicles//Vehicle//Model').text
            details << [ row[0], row[1], make, model ]
        end
    end

For some reason the URL returns the same data twice but I only need the first result. So far my attempts to grab the Make and Model from the XML has failed...any ideas?

Это было полезно?

Решение

Here's how to get at the make and model data. How to convert it to CSV is left to you:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<VINResult xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://basicvalues.pentondata.com/">
  <Vehicles>
    <Vehicle>
      <ID>131497</ID>
      <Product>TRUCK</Product>
      <Year>1993</Year>
      <Make>Freightliner</Make>
      <Model>FLD12064T</Model>
      <Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
    </Vehicle>
    <Vehicle>
      <ID>131497</ID>
      <Product>TRUCK</Product>
      <Year>1993</Year>
      <Make>Freightliner</Make>
      <Model>FLD12064T</Model>
      <Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
    </Vehicle>
  </Vehicles>
  <Errors/>
  <InvalidVINMsg/>
</VINResult>
EOT

vehicle_make_and_models = doc.search('Vehicle').map{ |vehicle|
    [
      'make', vehicle.at('Make').content,
      'model', vehicle.at('Model').content
    ]
  }

This results in:

vehicle_make_and_models # => [["make", "Freightliner", "model", "FLD12064T"], ["make", "Freightliner", "model", "FLD12064T"]]

If you don't want the field names:

vehicle_make_and_models = doc.search('Vehicle').map{ |vehicle|
  [
    vehicle.at('Make').content,
    vehicle.at('Model').content
  ]
}

vehicle_make_and_models # => [["Freightliner", "FLD12064T"], ["Freightliner", "FLD12064T"]]

Note: You have XML, not HTML. Don't assume that Nokogiri treats them the same, or that the difference is insignificant. Nokogiri parses XML strictly, since XML is a strict standard.

I use CSS selectors unless I absolutely have to use XPath. CSS results in a much clearer selector most of the time, which results in easier to read code.

vinxml.xpath('//VINResult//Vehicles//Vehicle//Make').text doesn't work, because // means "start at the top of the document". Each time it's encountered Nokogiri starts at the top, searches down, and finds all matching nodes. xpath returns all matching nodes as a NodeSet, not just a particular Node, and text will return the text of all Nodes in the NodeSet, resulting in a concatenated string of the text, which is probably not what you want.

I prefer to use search instead of xpath or css. It returns a NodeSet like the other two, but it also lets us use either CSS or XPath selectors. If your particular selector was ambiguous and could be interpreted as either CSS or XPath, then you can use the explicit form. Likewise, you can use at or xpath_at or css_at to find just the first matching node, which is equivalent to search('foo').first.

Другие советы

You could also do the following which will place all of the vehicles in an Array and all of the vehicle attributes into a Hash

require 'nokogiri'
doc = Nokogiri::XML(open(YOUR_XML_FILE))
vehicles = doc.search("Vehicle").map do |vehicle|
  Hash[
    vehicle.children.map do |child|
      [child.name, child.text] unless child.text.chomp.strip == ""
    end.compact
  ]
end
#=>[{"ID"=>"131497", "Product"=>"TRUCK", "Year"=>"1993", "Make"=>"Freightliner", "Model"=>"FLD12064T", "Description"=>"120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes  Power Steering 6x4 (SBA - Set Back Axle)"}, {"ID"=>"131497", "Product"=>"TRUCK", "Year"=>"1993", "Make"=>"Freightliner", "Model"=>"FLD12064T", "Description"=>"120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes  Power Steering 6x4 (SBA - Set Back Axle)"}]

Then you can access all the attributes for an individual vehicle i.e.

vehicles.first["ID"]
#=> "131497"
vehicles.first["Year"]
#=> "1993"

etc.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top