Here's the basis for such a transform:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<ul>
<li>list item 1</li>
<li>list item 2</li>
</ul>
EOT
puts doc.to_html
doc.search('ul').each do |ul|
ul.search('li').each do |li|
li.replace("* #{ li.text.strip }")
end
ul.replace(ul.text)
end
puts doc.to_html
Running that outputs:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><ul>
<li>list item 1</li>
<li>list item 2</li>
</ul></body></html>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>* list item 1
* list item 2
</body></html>
I didn't intend, or attempt, to make the first "item" have a leading carriage-return or line-feed. That's left as an exercise for the reader. Nor did I try to handle the <h4>
tags or similar substitutions. From the answer code you should be able to figure out how to do it.
Also, I'm using Nokogiri::HTML
to parse the HTML, which turns it into a full HTML document with the appropriate DOCTYPE header, <html>
and <body>
tags to mimic a full HTML document. That could be changed using Nokogiri::HTML::DocumentFragment.parse
instead but wouldn't really make a difference in the output.