Question

I am parsing an XML document to a database, and I am looking for an effective variant for my code. ITEM_ID is unique, but the code keeps adding new rows to the database and won't update existing rows.

This is an example of the source XML:

<?xml version="1.0" encoding="UTF-8" ?>
<SHOP>
<SHOPITEM>
<ITEM_ID>8159</ITEM_ID>
<PRODUCT></PRODUCT>
<DESCRIPTION></DESCRIPTION>
<URL>http://www.domovhracek.cz/</URL>
<IMGURL>http://www.domovhracek.cz/css/images/beznahledusmall.png</IMGURL>
<EAN></EAN>
<PRICE>2391</PRICE>
<CATEGORYTEXT>zaradit</CATEGORYTEXT>
<MANUFACTURER>Domov hraček</MANUFACTURER>
<PRICE_VAT>2893.00</PRICE_VAT>
<DELIVERY_DATE>0</DELIVERY_DATE>
</SHOPITEM>
</SHOP>

This is the source of the importing controller:

class ImportController < ApplicationController
  def importer
    require 'open-uri'
    require 'nokogiri'
    doc = Nokogiri::XML(open("http://www.domovhracek.cz/feed/heureka.xml"))
    doc.css('SHOPITEM').each do |node|
    children = node.children
    @conditions={:ITEM_ID=> children.css('ITEM_ID').inner_text}
    begin
      record = ShopItem.find(:first, :conditions => {:ITEM_ID=>children.css('ITEM_ID').inner_text} )
      record.PRODUCT=children.css('PRODUCT').inner_text
      record.DESCRIPTION=children.css('DESCRIPTION').inner_text

      record.save
    rescue
      record= ShopItem.create({:ITEM_ID=> children.css('ITEM_ID').inner_text})
      record.PRODUCT=children.css('PRODUCT').inner_text
      record.DESCRIPTION=children.css('DESCRIPTION').inner_text
      record.save
    end
  end
end

If I find the key ITEM_ID I will update my record but my code inserts a new row everytime .

Was it helpful?

Solution

First, use a loop like this to walk through your XML:

doc.search('SHOPITEM').each do |shop_item|
  item_id = shop_item.at('ITEM_ID').text
  product = shop_item.at('PRODUCT').text
  description = shop_item.at('DESCRIPTION').text
  # ...
end

Each iteration that finds a <SHOPITEM> tag will gather the information to create/modify a row.

Instead of using css, which is equivalent to search, you should use at or at_css. Both css and search return a NodeSet, which is like an array. If you use inner_text on a NodeSet that contains more than one node, you'll get all the text nodes concatenated together as a string, which is most likely not what you want and will usually be a bug you have to fix:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<foo>
  <bar>1</bar>
  <bar>2</bar>
</foo>
EOT

doc.search('bar').inner_text # => "12"
doc.css('bar').inner_text # => "12"

For more information, read the documentation for css, search, at and at_css in the Nokogiri::XML::Node page, along with the inner_text documentation in the Nokogiri::XML::NodeSet page.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top