Question

I'm using Ruby to build gexf-formatted XML structure that would represent a network graph. The graph consists of of several levels of nested nodes. The idea is to parse a file that looks something like this:

| top node | middle node | bottom node |
|    a     |      1      |    "name1"  |
|    b     |      1      |    "name6"  |
|    a     |      2      |    "name3"  |
|    b     |      2      |    "name8"  |
|    b     |      1      |    "name5"  |
|    a     |      1      |    "name2"  |
|    b     |      2      |    "name7"  |
|    a     |      2      |    "name4"  |

and transform it into this:

<node id = a label = "top node">  
  <node id = 1 label = "middle node">
    <node id = name1 label = "bottom node"/>
    <node id = name2 label = "bottom node"/>
  </node>    
  <node id = 2 label = "middle node">      
    <node id = name3 label = "bottom node"/>
    <node id = name4 label = "bottom node"/>
  </node> 
</node>
<node id = b label = "top node">  
  <node id = 1 label = "middle node">
    <node id = name5 label = "bottom node"/>
    <node id = name6 label = "bottom node"/>
  </node>    
  <node id = 2 label = "middle node">      
    <node id = name7 label = "bottom node"/>
    <node id = name8 label = "bottom node"/>
  </node> 
</node>

As you can see, since the lines in the file are not in any particular order, I need to be able to reference each node and sub-node when building the XML file.

In case my question is still not clear, when I read the line:

|    b     |      1      |    "name6"  |

I need to be able to tell the builder to stick this node "name6" inside "top node b" and "middle node 1". Is it at all possible with Builder or Nokogiri's builder or anything else out there?

Was it helpful?

Solution

Instead of trying to keep a handle on nodes as you build them, use the CSS (or XPath) querying capabilities of Nokogiri to look for nodes already added to the doc, when you need them:

require 'nokogiri'

# Create an array of the top/middle/bottom node ids
rows = File.readlines('my.data')[1..-1].map{ |row| row.scan(/[^|\s"]+/) }

# Look underneath a parent node for another node with a specific id
# If you can't find one, create one (with the label) and return it.
def find_or_create_on(parent,id,label)
  parent.at("node[id='#{id}']") or
  parent.add_child("<node id='#{id}' label='#{label}' />")[0]
end

# Since an XML document can only ever have one root node,
# and your data can have many, let's wrap them all in a new document
root = Nokogiri.XML('<root></root>').root

# For each triplet, find or create the nodes you need, in order
# (When iterating an array of arrays, you can automagically convert
#  each item in the sub-array to a named variable.)
rows.each do |top_id, mid_id, bot_id|
  top = find_or_create_on( root, top_id, 'top node'    )
  mid = find_or_create_on( top,  mid_id, 'middle node' )
  bot = find_or_create_on( mid,  bot_id, 'bottom node' )
end

puts root
#=> <root>
#=>   <node id="a" label="top node">
#=>     <node id="1" label="middle node">
#=>       <node id="name1" label="bottom node"/>
#=>       <node id="name2" label="bottom node"/>
#=>     </node>
#=>     <node id="2" label="middle node">
#=>       <node id="name3" label="bottom node"/>
#=>       <node id="name4" label="bottom node"/>
#=>     </node>
#=>   </node>
#=>   <node id="b" label="top node">
#=>     <node id="1" label="middle node">
#=>       <node id="name6" label="bottom node"/>
#=>       <node id="name5" label="bottom node"/>
#=>     </node>
#=>     <node id="2" label="middle node">
#=>       <node id="name8" label="bottom node"/>
#=>       <node id="name7" label="bottom node"/>
#=>     </node>
#=>   </node>
#=> </root>

Note that you may want to reconsider your usage of the attribute id, as the values you supplied here are neither a) globally unique throughout the document, nor b) valid identifiers (a number can't be an ID value in XML).

Also, your output has some child nodes sorted in a different order than they appear in the source data. For example, b/2/name8 appears before b/2/name7, and so my solution creates them in this order. If you need them sorted, then sort your rows first, e.g.:

rows.sort.each do |top_id,mid_id,bot_id|
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top