Question

I am trying to transform a tokenized string (an english sentence) to HTML span tags to display in HTML.

Here are the basic steps I am trying to perform

  1. Take a tokenized string which contains spaces
  2. Enclose it with <root></root> to make it a valid xml
  3. Create a nokogiri object to access the xml
  4. Able to iterate through "element_children" node set to access the name and text and use this to transform the token to a <span class=token>
  5. However, unable to access the #(Text " ") which is present in the nokogiri object (step 7 in pry)
  6. Therefore, when I try to add these elements to an array which later I would join to create a HTML, I am losing the spaces in the original string.

Any pointers to the right method to use in nokogiri would be highly appreciative. Similarly, any other suggestion welcome.

You can view the code:

require 'nokogiri'

sentence_tagged = '<det>A</det> <nn>fleet</nn> <in>of</in> <nns>warships</nns><stop>.</stop>'
sentence_xml = '<root>' + sentence_tagged + '</root>'
nok_sent = Nokogiri::XML(sentence_xml)
array = []
nok_sent.root.element_children.each {|child| array << "<span class='" + child.name + "'>"

array
# => ["<span class='det'>A</span>",
# "<span class='nn'>fleet</span>",
# "<span class='in'>of</span>",
# "<span class='nns'>warships</span>",
# "<span class='stop'>.</span>"]

array.join
# => "<span class='det'>A</span><span class='nn'>fleet</span><span class='in'>of</span><span class='nns'>warships</span><span class='stop'>.</span>"
Was it helpful?

Solution

You should use children instead of element_children:

array = []
nok_sent.root.children.each {|child| array << "<span class='" + child.name + "'>" +child.text+ "</span>" }

array
# => ["<span class='det'>A</span>", "<span class='text'> </span>", "<span class='nn'>fleet</span>", "<span class='text'> </span>", "<span class='in'>of</span>", "<span class='text'> </span>", "<span class='nns'>warships</span>", "<span class='stop'>.</span>"] 
array.join
# => "<span class='det'>A</span><span class='text'> </span><span class='nn'>fleet</span><span class='text'> </span><span class='in'>of</span><span class='text'> </span><span class='nns'>warships</span><span class='stop'>.</span>" 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top