Parsing XML into CSV using Nokogiri

Question 1

Here's how to get at the make and model data. How to convert it to CSV is left to you:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<VINResult xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://basicvalues.pentondata.com/">
  <Vehicles>
    <Vehicle>
      <ID>131497</ID>
      <Product>TRUCK</Product>
      <Year>1993</Year>
      <Make>Freightliner</Make>
      <Model>FLD12064T</Model>
      <Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
    </Vehicle>
    <Vehicle>
      <ID>131497</ID>
      <Product>TRUCK</Product>
      <Year>1993</Year>
      <Make>Freightliner</Make>
      <Model>FLD12064T</Model>
      <Description>120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes & Power Steering 6x4 (SBA - Set Back Axle)</Description>
    </Vehicle>
  </Vehicles>
  <Errors/>
  <InvalidVINMsg/>
</VINResult>
EOT

vehicle_make_and_models = doc.search('Vehicle').map{ |vehicle|
    [
      'make', vehicle.at('Make').content,
      'model', vehicle.at('Model').content
    ]
  }

This results in:

vehicle_make_and_models # => [["make", "Freightliner", "model", "FLD12064T"], ["make", "Freightliner", "model", "FLD12064T"]]

If you don't want the field names:

vehicle_make_and_models = doc.search('Vehicle').map{ |vehicle|
  [
    vehicle.at('Make').content,
    vehicle.at('Model').content
  ]
}

vehicle_make_and_models # => [["Freightliner", "FLD12064T"], ["Freightliner", "FLD12064T"]]

Note: You have XML, not HTML. Don't assume that Nokogiri treats them the same, or that the difference is insignificant. Nokogiri parses XML strictly, since XML is a strict standard.

I use CSS selectors unless I absolutely have to use XPath. CSS results in a much clearer selector most of the time, which results in easier to read code.

vinxml.xpath('//VINResult//Vehicles//Vehicle//Make').text doesn't work, because // means "start at the top of the document". Each time it's encountered Nokogiri starts at the top, searches down, and finds all matching nodes. xpath returns all matching nodes as a NodeSet, not just a particular Node, and text will return the text of all Nodes in the NodeSet, resulting in a concatenated string of the text, which is probably not what you want.

I prefer to use search instead of xpath or css. It returns a NodeSet like the other two, but it also lets us use either CSS or XPath selectors. If your particular selector was ambiguous and could be interpreted as either CSS or XPath, then you can use the explicit form. Likewise, you can use at or xpath_at or css_at to find just the first matching node, which is equivalent to search('foo').first.

Question 2

You could also do the following which will place all of the vehicles in an Array and all of the vehicle attributes into a Hash

require 'nokogiri'
doc = Nokogiri::XML(open(YOUR_XML_FILE))
vehicles = doc.search("Vehicle").map do |vehicle|
  Hash[
    vehicle.children.map do |child|
      [child.name, child.text] unless child.text.chomp.strip == ""
    end.compact
  ]
end
#=>[{"ID"=>"131497", "Product"=>"TRUCK", "Year"=>"1993", "Make"=>"Freightliner", "Model"=>"FLD12064T", "Description"=>"120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes  Power Steering 6x4 (SBA - Set Back Axle)"}, {"ID"=>"131497", "Product"=>"TRUCK", "Year"=>"1993", "Make"=>"Freightliner", "Model"=>"FLD12064T", "Description"=>"120'' BBC Alum Air Cond Long Conv. (SBA) Tractor w/48'' Sleeper Air Brakes  Power Steering 6x4 (SBA - Set Back Axle)"}]

Then you can access all the attributes for an individual vehicle i.e.

vehicles.first["ID"]
#=> "131497"
vehicles.first["Year"]
#=> "1993"

etc.