How do I replace tags defining a node?

Question 1

Here's the basis for such a transform:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<ul>
    <li>list item 1</li>
    <li>list item 2</li>
</ul>
EOT
puts doc.to_html

doc.search('ul').each do |ul|
  ul.search('li').each do |li|
    li.replace("* #{ li.text.strip }")
  end
  ul.replace(ul.text)
end

puts doc.to_html

Running that outputs:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><ul>
<li>list item 1</li>
    <li>list item 2</li>
</ul></body></html>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>* list item 1
    * list item 2
</body></html>

I didn't intend, or attempt, to make the first "item" have a leading carriage-return or line-feed. That's left as an exercise for the reader. Nor did I try to handle the <h4> tags or similar substitutions. From the answer code you should be able to figure out how to do it.

Also, I'm using Nokogiri::HTML to parse the HTML, which turns it into a full HTML document with the appropriate DOCTYPE header, <html> and <body> tags to mimic a full HTML document. That could be changed using Nokogiri::HTML::DocumentFragment.parse instead but wouldn't really make a difference in the output.

Question 2

You may want to look at ClothRed, which is an HTML to Textile converter in Ruby. It hasn't been updated in a while, but it's simple and may be a good starting point for your own converter.

If you really want to use Nokogiri, you're writing a filter, so you may want to use the SAX interface.

Question 3

You may want to try McBean (https://github.com/flavorjones/mcbean) [caveat: I'm the author of the gem, and it hasn't been updated in a while].

It's similar to ClothRed in spirit, but uses Nokogiri under the hood and actually transforms the document structure into output text. It supports substantial subset of Textile; and in fact I've used it successfully to convert wiki pages between wiki systems, as you're trying to do.

Question 4

If anybody interested finds this later, another alternative is to use Pandoc. I've just did my first tests, and it seems almost sufficient, and it can do many more formats.