Question

I am translating some code from attoparsec to Parsec, because the parser needs to produce better error messages. The attoparsec code uses inClass (and notInClass) extensively. Is there a similar function for Parsec that lets me translate inClass-occurences mechanically? Hayoo and Hoogle didn't offer any insight into the matter.

inClass :: String -> Char -> Bool

inClass "a-c'-)0-3-" is equivalent to \ x -> elem x "abc'()0123-", but the latter is inefficient and tedious to write for large ranges.

I will reimplement the function myself if nothing else is available.

Was it helpful?

Solution

There isn't any such combinator; if there was, it would be in Text.Parsec.Char (which is where all the standard parser combinator functions that involve Char are defined). You should be able to define it fairly easily.

I don't think you'll be able to get the same performance advantages attoparsec does with its implementation, though; it relies on the internal FastSet type, which only works with 8-bit characters. Of course, if you don't need Unicode support, that might not be a problem, but the code for FastSet implies you'll get unpredictable results passing Chars greater than '\255', so if you want to reuse the FastSet-based solution, you'll at least have to read the strings you're parsing in binary mode. (You'll also have to copy the implementation of FastSet into your program, as it's not exported...)

If your range strings are short, then a simple solution like this is likely to be pretty fast:

type Range = (Char, Char)

inClass :: String -> Char -> Bool
inClass = inClass' . parseClass

parseClass :: String -> [Range]
parseClass "" = []
parseClass (a:'-':b:xs) = (a, b) : parseClass xs
parseClass (x:xs) = (x, x) : parseClass xs

inClass' :: [Range] -> Char -> Bool
inClass' cls c = any (\(a,b) -> c >= a && c <= b) cls

You could even try something like this, which should be at least as efficient as the above version (including when many calls to a single inClass s are made), and additionally avoid the list traversal overhead:

inClass :: String -> Char -> Bool
inClass "" = const False
inClass (a:'-':b:xs) = \c -> (c >= a && c <= b) || f c where f = inClass xs
inClass (x:xs) = \c -> c == x || f c where f = inClass xs

(taking care to move the recursion out of the lambda; I don't know if GHC can/will do this itself.)

OTHER TIPS

No, there's no equivalent in parsec. You have to write it yourself. I see two main options,

  1. parse the inClass syntax to create a String from it, to use with oneOf
  2. parse it to create a function to pass to satisfy

the former is of course a special case of the latter, and if you have longer ranges in your class, it will be less efficient. But it's probably a bit easier to implement.

(|||) :: (a -> Bool) -> (a -> Bool) -> a -> Bool
p ||| q = \x -> p x || q x
(&&&) :: (a -> Bool) -> (a -> Bool) -> a -> Bool
p &&& q = \x -> p x && q x

parseClass (l:'-':h:more) = ((>= l) &&& (<= h)) ||| parseClass more
parseClass (c:cs) = (== c) ||| parseClass cs
parseClass [] = const False

is a simple-minded possibility.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top