Domanda

Problem

I am rolling a nondeterministic parser for self-education, it looks like:

newtype Parser a b = P { runP :: [a] -> [(b, [a])] }

And I want to be able to drop in a subroutine that might be of form: [a] -> [b], which takes a buffer of chars and sends it to a list of results. The trick here is that the subroutine itself is a stateful computation, and it transitions through states upon each successful invocation (think of it as a finite state machine). Specifically:

  1. If the subroutine outputs the empty list [], the parser inserts one more char into the buffer and yields it to the subroutine, which runs again.
  2. If the subroutine outputs non-empty list [b], the buffer is first flushed and the parser inserts one more char into the buffer, yielding it to subroutine. The [b] is stored somewhere
  3. Until an escape condition is reached, steps 1 and 2 are run over and over again. All intermediate results should be combined in some way.
  4. Once the escape condition is reached, the subroutine yields the results bs back to the parser, and combines it with the remaining stream of as like so:

    rs = fmap (flip (,) as) bs :: [(b,[a])]

thus satisfiying the signature of runP

The function might have this signature: withf :: ([a] -> [b]) -> Parser a b

The important thing is that withf g has to be a parser, so I can build larger parsers using <*>. Note the function signature suggest g is a pure function, so it's unlikely to be correct.

Tried Solutions

I tried to implement this using various coroutine packages, but to me it makes more sense to lift a parser into a coroutine computation context, composing it with a transducer which is also lifted into context, which means it's no longer a parser.

I also tried to implement withf as a primitive function that would have access to the Parser value constructor. Basically translating steps 1..4 into code. The biggest problem I have here is who is in charge of what information:

  1. state of buffer
  2. state of intermediate results
  3. the logic of how intermediate results are to be combined.
  4. how should escape condition be implemented, or better yet composed into withf

I also tried various home-brew coroutine implementations baked right into the parser (so not using Parser type defined above), with little success.

Anyone who could point me in the right direction is greatly appreciated.

È stato utile?

Soluzione

First let's use MonadPlus instead of [] in the parser. It will make it more generic and clarify the code a bit (we won't have so many nested []s):

newtype Parser a m b = P { runP :: [a] -> m (b, [a]) }

I suggest you to change the signature of your subroutines. What you need is to:

  • signal if a subroutine needs more input or not, and
  • keep the state somewhere.

This can be easily accomplished by this type signature:

newtype Sub a b = Sub { runSub :: Either (a -> Sub a b) [b] }

A subroutine either yields a result, or requests a new input and produces a new subroutine. This way, you can keep whatever state you need by passing it into the returned subroutine. The conversion function will then look like:

withf :: (MonadPlus m) => Sub a b -> Parser a m b
withf start = P $ f (runSub start)
  where
    f (Right bs) xs   = msum [ return (b, xs) | b <- bs ]
    f (Left r)   []   = mzero    -- No more input, can't proceed.
    f (Left r) (x:xs) = f (runSub (r x)) xs

Update: Another approach we could take is to realize that the parser is actually a StateT transformer, whose state is [a]:

type Parser a m b = StateT [a] m b

runP :: (Monad m) => Parser a m b -> [a] -> m (b, [a])
runP = runStateT

Indeed, runP is exactly runStateT!

This way, we get a Monad instance for Parser for free. And now we can split our task into smaller blocks. First, we create a parser that consumes one input, or fails:

receive :: (MonadPlus m) => Parser a m a
receive = get >>= f
  where
    f []        = mzero    -- No more input, can't proceed.
    f (x:xs)    = put xs >> return x

and then use it to describe withf:

withf :: (MonadPlus m) => Sub a b -> Parser a m b
withf start = f (runSub start)
  where
    f (Right bs) = msum (map return bs)
    f (Left r)   = receive >>= f . runSub . r

Notice that if m is a MonadPlus then also StateT s m is a MonadPlus, so we can use directly mzero and msum with Parser.

Altri suggerimenti

First, lets define a new data type to represent the possible results of a parse.

data Step r = Done | Fail | Succ r

The parser can either finish with Done, indicate a failed parse with Fail or successfully return a parsed value r with Succ r.

We'll make our Step data type an instance of the Monoid typeclass

instance Monoid (Step r) where
    mempty = Done
    Done   `mappend` _ = Done
    Fail   `mappend` x = x
    Succ r `mappend` _ = Succ r

If our parser is Done, we should immediately terminate. A Fail means we should check the result of the next Step for a possible success. Succ r, of course, means we have successfully parsed a value.

Now let's define a type synonym for a Parser. It need to be able to accumulate the parsed results (Writer) and maintains a pure state that represents the input that hasn't been consumed yet (State).

{-# LANGUAGE FlexibleContexts #-}                                                                   

import Control.Monad.State
import Control.Monad.Writer
import Data.List
import Data.Foldable

type Parser w s = WriterT w (State s)

evalParser :: Parser w s r -> s -> w
evalParser = evalState . execWriterT

Here's the actual parser

parser :: (MonadState [s] m, MonadWriter [w] m) => ([s] -> Step [w]) -> m ()
parser sub = do
    bufs <- gets inits
    -- try our subroutine on increasingly long prefixes until we are done,
    -- or there is nothing left to parse, or we successfully parse something
    case foldMap sub bufs of
         Done   -> return ()
         Fail   -> return ()
         Succ r -> do
            -- record our parsed result
            tell r
            -- remove the parsed result from the state
            modify (drop $ length r) 
            -- parse some more
            parser sub

and a simple test case

test :: String
test = evalParser (parse sub) "aabbcdde"
    where sub "aabb" = Succ "aabb"
          sub "cdd"  = Succ "cdd"
          sub "e"    = Done
          sub _      = Fail

-- test == "aabbcdd"
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top