Question

I'm trying to read in a binary file and parse it lazily using the 'binary' package. The package documentation gives an example of how to do this without forcing all the input for a scenario very similar to mine:

 example2 :: BL.ByteString -> [Trade]
 example2 input
  | BL.null input = []
  | otherwise =
    let (trade, rest, _) = runGetState getTrade input 0
    in trade : example2 rest

However, this uses the deprecated runGetState function, which itself points you towards the runGetIncremental function.

The problem is that the 'runGetIncremental' function seems to force the remaining input to be a strict bytestring, thus forcing it to load the whole file into memory. Indeed, I'm seeing memory usage of around 6GB when I try to run this. Even the implementation of runGetState now seems to be based on runGetIncremental and then reconverts the strict bytestring back to a lazy one using chunk.

Can I get the behaviour as described in the tutorial, or is this now unsupported by binary? If the latter, what's the best way to do this? I have a little experience using conduit, but it's not clear to me how I could use it here.

Was it helpful?

Solution

You can do this using pipes-binary and pipes-bytestring. Here's a helper function for your benefit:

import Control.Monad (void)
import Data.Binary
import Pipes
import Pipes.Binary (decodeMany)
import Pipes.ByteString (fromHandle)
import qualified Pipes.Prelude as P
import System.IO

decodeHandle :: (Binary a) => Handle -> Producer a IO ()
decodeHandle handle = void $ decodeMany (fromHandle handle) >-> P.map snd

The void and map snd are there because decodeMany actually returns more information (like byte offsets and parsing errors). If you actually want that information, then just remove them.

Here's an example of how you might use decodeHandle, using a quick skeleton for Trade I threw together:

data Trade = Trade

instance Binary Trade where
    get   = return Trade
    put _ = return ()

instance Show Trade where show _ = "Trade"

main = withFile "inFile.txt" ReadMode $ \handle -> runEffect $
    for (decodeHandle handle) $ \trade -> do
        lift $ print (trade :: Trade)
        -- do more with the parsed trade

You can use for to loop over the decoded trades and handle them, or if you prefer you can use pipe composition:

main = withFile "inFile.txt" ReadMode $ \handle -> runEffect $
    decodeHandle handle >-> P.print

This will be lazy and only decode as many trades as you actually need. So if you insert a take in between the decoder and the printer, it will only read as much input as necessary to process the requested number of trades:

main = withFile "inFile.txt" ReadMode $ \handle -> runEffect $
    for (decodeHandle handle >-> P.take 4) $ \trade -> do
        ... -- This will only process the first 4 trades

-- or using purely pipe composition:

main = withFile "inFile.txt" ReadMode $ \handle -> runEffect $
    decodeHandle handle >-> P.take 4 >-> P.print
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top