Question

I built a GPX parser using XML-conduit and have had issues with overly verbose and fragile code for identifying elements and skipping unwanted tags.

Identifying elements (a minor annoyance)

I am explicitly ignoring name space by comparing only nameLocalNames. I guess the correct way is to hardcode the right namespace(s) into the program and have a helper construct my element names for comparison in the tag* functions? This is slightly annoying as I must support at least two different name spaces (GPX 1.1 and 1.0) that are sufficiently similar that they require no code changes for my uses.

Skipping elements

GPX is largish and the set of custom extensions is larger. Because the tool I'm building needs limited information I decided to ignore particular tags along with all of their sub-elements. For example:

<trkpnt lat="45.19843" lon="-122.428">
    <ele>4</ele>
    <time>...</time>
    <extensions>
         ...
    </extensions>
</trkpnt>

To ignore extensions and similar tags with numerous sub elements I made a sink that would consume elements till the end element Event:

skipTagAndContents :: (MonadThrow m) => Text -> Sink Event m (Maybe ())
skipTagAndContents n = tagPredicate ((== n) . nameLocalName)
                                    ignoreAttrs
                                    (const $ many (skipElements n) >> return ())

skipElements t = do
        x <- await
        case x of
                Just (EventEndElement n) | nameLocalName n == t -> Done x Nothing
                Nothing -> Done x Nothing
                _ -> return (Just ())

It seems there should be a tag* variant that will do this for me (succeed without all children being consumed) but that fact that there isn't suggests I am missing a simple combinator or should send a patch - which is it?

Was it helpful?

Solution

If you're not using the namespaces at all, it might be simplest to just strip them out entirely, using something like Data.Conduit.List.map stripNamespace.

Frankly, I don't really use the streaming interface that often myself; almost all of my work involves the DOM (Text.XML) or cursor interfaces. So it's entirely possible that there are missing combinators. But in this case, I believe you can simplify the implementation a lot, since tagPredicate shouldn't allow the inner Sink to read past the end of the element. So you can rewrite skipTagAndContents as:

tagPredicate ((== n) . nameLocalName) ignoreAttrs (const Data.Conduit.List.sinkNull)

You should test that before just dropping it in, I could be remembering some of the details of the streaming interface incorrectly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top