This is kind of a corner case. I'm running Haskell, Text.XmlHtml (version 0.2.3). I'm getting my source data from Pandoc (version 1.12). My source files are all in Markdown format.
The corner deals with when I have raw Html directly in my Markdown file. This is, of course, supported by the Markdown format, and sometimes is the only way for me to get the kind of Table layout that I want. Pandoc reads the file just file, but then when it gets to the Html section, what it emits is roughly like this:
[ RawInline (Format "html") "<a href=\"abcdefg\">"
, RawInline (Format "html") "<img src=\"image.png\" />"
, RawInline (Format "html") "</a>" ]
So... converting this into a hierarchical tree could get very complicated. The desired result, in XmlHtml would be something like this:
Element "a" [("href", "abcdefg")] [Element "img" [("src", "image.png")]]
But that is very difficult to get when I'm dealing with a structure that was hierarchical (everything else Pandoc emits is nicely hierarchical) and suddenly is not, but that "not hierarchical" part is only findable by basically building an Html parser. That works on multiple strings that surround other structures.
ideally, I would like to emit is a simple TextNode:
TextNode "<a href=\"abcdefg\"><img src=\"image.png\" /></a>"
I could do that either by emitting a bunch of TextNodes, one for each RawInline, or by glomming together the RawInline elements. The point is that I want to emit a TextNode that has raw Html in it and have that ultimately rendered without any extra Html escaping.
My renderer is ultimately a Heist snippet, but that probably means it runs by way of Blaze.
My final alternative, which might work, is to go from Pandoc through the Blaze Html renderer and then through the XmlHtml parser to get something that I can embed into a Heist snippet. I'd just like to avoid that because it feels dirty.
(I think I would actually run into the same problem if I wanted to put Java script into my Markdown documents... which is technically allowed by the language but probably very evil.)
Is there a way to do this, or am I too limited by my tools?
Update
I tried the route of rendering from Pandoc to Blaze to XmlHtml. Turns out that I get the same result, with the Html put into the final nodes in escaped from and thus appearing in the browser. Here was my function (which was much shorter and easier than the full implementation I'd done...)
pandocToHtml :: Pandoc.Pandoc -> [XmlHtml.Node]
pandocToHtml = Text.Blaze.Renderer.XmlHtml.renderHtmlNodes . Pandoc.writeHtml Pandoc.def
Pandoc.def includes all of the "allow_raw_*" extensions, including allow_raw_html.
Final thing I can think to do is to apply my own piecemeal html parser (and then maybe contribute it to Pandoc). Which, in the end, couldn't be horribly hard.