Вопрос

I'm trying to get my head around HXT, a Haskell library for parsing XML that uses arrows. For my specific use case I'd rather not use deep as there are cases where <outer_tag><payload_tag>value</payload_tag></outer_tag> is distinct from <outer_tag><inner_tag><payload_tag>value</payload_tag></inner_tag></outer_tag> but I ran into some weirdness that felt like it should work but doesn't.

I've managed to come up with a test case based on this example from the docs:

{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
module Main where

import Text.XML.HXT.Core

data Guest = Guest { firstName, lastName :: String }
  deriving (Show, Eq)


getGuest = deep (isElem >>> hasName "guest") >>> 
  proc x -> do
    fname <- getText <<< getChildren <<< deep (hasName "fname") -< x
    lname <- getText <<< getChildren <<< deep (hasName "lname") -< x
    returnA -< Guest { firstName = fname, lastName = lname }

getGuest' = deep (isElem >>> hasName "guest") >>> 
  proc x -> do
    fname <- getText <<< getChildren <<< (hasName "fname") <<< getChildren -< x
    lname <- getText <<< getChildren <<< (hasName "lname") <<< getChildren -< x
    returnA -< Guest { firstName = fname, lastName = lname }

getGuest'' = deep (isElem >>> hasName "guest") >>> getChildren >>>
  proc x -> do
    fname <- getText <<< getChildren <<< (hasName "fname") -< x
    lname <- getText <<< getChildren <<< (hasName "lname") -< x
    returnA -< Guest { firstName = fname, lastName = lname }


driver finalArrow = runX (readDocument [withValidate no] "guestbook.xml" >>> finalArrow)

main = do 
  guests <- driver getGuest
  print "getGuest"
  print guests

  guests' <- driver getGuest'
  print "getGuest'"
  print guests'

  guests'' <- driver getGuest''
  print "getGuest''"
  print guests''

Between getGuest and getGuest' I expand deep into the correct number of getChildren. The resulting function still works. I then factor the getChildren outside of the do block but this causes the resulting function to fail. The output is:

"getGuest"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest'"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest''"
[]

I feel like this should be a valid transformation to perform, but my understanding of arrows is a little shaky. Am I doing something wrong? Is this a bug that I should report?

I'm using HXT version 9.3.1.3 (the latest at the time of writing). ghc --version prints "The Glorious Glasgow Haskell Compilation System, version 7.4.1". I've also tested on a box with ghc 7.6.3 and got the same result.

The XML file had the following repetitive structure (the full file can be found here)

<guestbook>
  <guest>
    <fname>John</fname>
    <lname>Steinbeck</lname>
  </guest>
  <guest>
    <fname>Henry</fname>
    <lname>Ford</lname>
  </guest>
  <guest>
    <fname>Andrew</fname>
    <lname>Carnegie</lname>
  </guest>
</guestbook>
Это было полезно?

Решение

In getGuest'' you have

... (hasName "fname") -< x
... (hasName "lname") -< x

That is, you are restricting to the case where x is "fname" and x is "lname", which isn't satisfied by any x!

Другие советы

I've managed to work out the specific reason that the construction is interpreted the way it is. The following arrow translation found here provides a base to work from

addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = proc x -> do
                y <- f -< x
                z <- g -< x
                returnA -< y + z

Becomes:

addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = arr (\ x -> (x, x)) >>>
           first f >>> arr (\ (y, x) -> (x, y)) >>>
           first g >>> arr (\ (z, y) -> y + z)

From this we can, by analogy, derive:

getGuest''' = preproc >>>
           arr (\ x -> (x, x)) >>>
           first f >>> arr (\ (y, x) -> (x, y)) >>>
           first g >>> arr (\ (z, y) -> Guest {firstName = z, lastName = y})

    where preproc = deep (isElem >>> hasName "guest") >>> getChildren
        f = getText <<< getChildren <<< (hasName "fname")
        g = getText <<< getChildren <<< (hasName "lname")

In HXT, the arrows can be imagined as streams of values flowing through filters. arr (\x->(x,x)) does not "split the stream", as I'd hoped. Instead it creates a stream of tuples that are filtered by f and survivors are filtered by g. As f and g are mutually exclusive, there are no survivors.

Examples with getChildren inside miraculously worked because the tuple stream contained values from further up the XML document looking something like

<guest>
    <fname>John</fname>
    <lname>Steinbeck</lname>
</guest>

and so were not mutually exclusive.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top