What's Haskell's attitude towards Unicode in XML?

https://stackoverflow.com/questions/12750418

05-07-2021
|

Question

I want to know what is the official solution to processing Unicode XML in Haskell is. I notice that HXT uses a simple String representation (a list of Unicode characters!!!) for text.

http://hackage.haskell.org/packages/archive/hxt/9.3.1.0/doc/html/Text-XML-HXT-DOM-TypeDefs.html#t:XNode

Constructors
XText String    ordinary text (leaf)
XBlob Blob          text represented more space efficient as bytestring (leaf)

How do you choose between the two representations when parsing? Forcing the user into using lists of characters doesn't sound like a particularly attractive feature, especially if the XML documents has a lot of text content.

Also, I found http://hackage.haskell.org/package/hxt-unicode on Google but am not sure how it is intended to be used with parsing. Support for Unicode used to be much more explicit as well: http://hackage.haskell.org/packages/archive/hxt/8.5.2/doc/html/Text-XML-HXT-DOM-Unicode.html but this module has been removed in the latest version (9.3.1.0 at the time of writing) without clear reason. What was the motivation behind that?

Could somebody give some example code, also, of how HXT is intended to be used please? The wiki pages are seriously lacking in this respect. Thank you.

Solution

The xml-conduit package uses the Text datatype for storing textual data. It's become the standard textual data representation over the past few years. xml-conduit is a well maintained package, and I've personally used it for a huge amount of both open source and commercial code.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow