Pergunta

All the examples I've seen so far using the Haskell XML toolkit, HXT, uses runX to execute the parser. runX runs inside the IO monad. Is there a way of using this XML parser outside of IO? Seems to be a pure operation to me, don't understand why I'm forced to be inside IO.

Foi útil?

Solução

You can use HXT's xread along with runLA to parse an XML string outside of IO.

xread has the following type:

xread :: ArrowXml a => a String XmlTree

This means you can compose it with any arrow of type (ArrowXml a) => a XmlTree Whatever to get an a String Whatever.

runLA is like runX, but for things of type LA:

runLA :: LA a b -> a -> [b]

LA is an instance of ArrowXml.

To put this all together, the following version of my answer to your previous question uses HXT to parse a string containing well-formed XML without any IO involved:

{-# LANGUAGE Arrows #-}
module Main where

import qualified Data.Map as M
import Text.XML.HXT.Arrow

classes :: (ArrowXml a) => a XmlTree (M.Map String String)
classes = listA (divs >>> pairs) >>> arr M.fromList
  where
    divs = getChildren >>> hasName "div"
    pairs = proc div -> do
      cls <- getAttrValue "class" -< div
      val <- deep getText         -< div
      returnA -< (cls, val)

getValues :: (ArrowXml a) => [String] -> a XmlTree (String, Maybe String)
getValues cs = classes >>> arr (zip cs . lookupValues cs) >>> unlistA
  where lookupValues cs m = map (flip M.lookup m) cs

xml = "<div><div class='c1'>a</div><div class='c2'>b</div>\
      \<div class='c3'>123</div><div class='c4'>234</div></div>"

values :: [(String, Maybe String)]
values = runLA (xread >>> getValues ["c1", "c2", "c3", "c4"]) xml

main = print values

classes and getValues are similar to the previous version, with a few minor changes to suit the expected input and output. The main difference is that here we use xread and runLA instead of readString and runX.

It would be nice to be able to read something like a lazy ByteString in a similar manner, but as far as I know this isn't currently possible with HXT.


A couple of other things: you can parse strings in this way without IO, but it's probably better to use runX whenever you can: it gives you more control over the configuration of the parser, error messages, etc.

Also: I tried to make the code in the example straightforward and easy to extend, but the combinators in Control.Arrow and Control.Arrow.ArrowList make it possible to work with arrows much more concisely if you like. The following is an equivalent definition of classes, for example:

classes = (getChildren >>> hasName "div" >>> pairs) >. M.fromList
  where pairs = getAttrValue "class" &&& deep getText

Outras dicas

Travis Brown's answer was very helpful. I just want to add my own solution here, which I think is a bit more general (using the same functions, just ignoring the problem-specific issues).

I was previously unpickling with:

upIO      :: XmlPickler a => String -> IO [a]
upIO str   = runX $ readString [] str >>> arrL (maybeToList . unpickleDoc xpickle)

which I was able to change to this:

upPure    :: XmlPickler a => String -> [a]
upPure str = runLA (xreadDoc >>> arrL (maybeToList . unpickleDoc xpickle)) str

I completely agree with him that doing this gives you less control over the configuration of the parser etc, which is unfortunate.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top