Question

Consider XML input like this:

<root>
  <sub>
      <p att1=0 att2=1><i>foo</i></p>
      <p att1=1 att2=1><i>bar</i></p>
      <p att1=0 att2=0><i>baz</i></p>
      <p att1=0 att2=1><i>bazz</i></p>
  </sub>
</root>

Which should be transformed to:

<root>
  <sub>
      <p att1=0 att2=1><i>foo</i><i>bazz</i></p>
      <p att1=1 att2=1><i>bar</i></p>
      <p att1=0 att2=0><i>baz</i></p>
  </sub>
</root>

(Because both p parent elemets of <i>foo</i> and <i>bazz</i> are siblings and have the same attributes.)

How to do such a transformation with HXT arrows?

Was it helpful?

Solution

Ok, there is my try: The code first collects all attribute lists at the parent of the siblings and then does a merge for all attribute lists that are different:

{-# LANGUAGE Arrows #-}

module Main
where

import Data.List
import Text.XML.HXT.Core

example="\
\<root>\
\  <sub>\
\      <p att1=\"0\" att2=\"1\"><i>foo</i></p>\
\      <p att1=\"1\" att2=\"1\"><i>bar</i></p>\
\      <p att1=\"0\" att2=\"0\"><i>baz</i></p>\
\      <p att1=\"0\" att2=\"1\"><i>bazz</i></p>\
\  </sub>\
\</root>"

get_attrs name = getChildren >>> hasName name >>> proc x -> do
   a <- listA (((
          getAttrName
          &&& (getChildren >>> getText))  ) <<< getAttrl ) -< x
   returnA -< a


has_attrs atts = proc x -> do
   a <- listA (((
           getAttrName
           &&& (getChildren >>> getText))  ) <<< getAttrl ) -< x
   if (a == atts)
   then returnA -< x
   else none -<< ()

mk_attrs atts = map f atts
  where
    f (n, v) = sqattr n v

mergeSiblings_one inp name att = catA (map constA inp)
    >>> mkelem name
               (mk_attrs att)
               [getChildren
                >>> hasName name  >>> has_attrs att >>> getChildren ]

mergeSiblings_core name = proc x -> do
    a <- listA (get_attrs name >>. (sort.nub) ) -< x
    b <- listA this -< x
    c <- listA (getChildren >>> neg (hasName name)) -< x
    catA ((map (mergeSiblings_one b name) a) ++ (map constA c) ) -<< ()


is_parent_of name = getChildren >>> hasName name

mergeSiblings name = processTopDownUntil (
        is_parent_of name `guards` mergeSiblings_core name
    )

stuff = mergeSiblings "p"


main :: IO ()
main
    = do
      x <- runX ( 
             configSysVars  [withTrace 1]
             >>> readString [withValidate no
                           ,withPreserveComment yes
                           ,withRemoveWS yes
                        ] example
             >>> setTraceLevel 4
             >>> stuff >>> traceTree >>> traceSource
           )
      return ()

Output for the example

<root>
  <p att1="0" att2="0">
    <i>baz</i>
  </p>
  <p att1="0" att2="1">
    <i>foo</i>
    <i>bazz</i>
  </p>
  <p att1="1" att2="1">
    <i>bar</i>
  </p>
</root>

Nice to have

The above version puts the merged children in front and the none-matching ones in the new children list of the parent node: A nice variation would be: insert each merged child at the old position of the first old sibling node and don't change the order of non-merged nodes. For example that

<other>1</other><p><a/></p><other>2</other><p><b/></p>

is transformed to

<other>1</other><p><a/><b/></p><other>2</other>

and not to:

<p><a/><b/></p><other>1</other><other>2</other>

Disclaimer

Since I am new to HXT and arrows - I wouldn't be surprised if there are more concise/HXT0idiomatic/elegant answers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top