Question

First up, a simplified version of the task I want to accomplish: I have several large files (amounting to 30GB) that I want to prune for duplicate entries. To this end, I establish a database of hashes of the data, and open the files one-by-one, hashing each item, and recording it in the database and the output file iff its hash wasn't already in the database.

I know how to do this with iteratees, enumerators, and I wanted to try conduits. I also know how to do it with conduits, but now I want to use conduits & persistent. I'm having problems with the types, and possibly with the entire concept of ResourceT.

Here's some pseudo code to illustrate the problem:

withSqlConn "foo.db" $ runSqlConn $ runResourceT $ 
     sourceFile "in" $= parseBytes $= dbAction $= serialize $$ sinkFile "out"

The problem lies in the dbAction function. I would like to access the database here, naturally. Since the action it does is basically just a filter, I first thought to write it like that:

dbAction = CL.mapMaybeM p
     where p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => DataType -> m (Maybe DataType)
           p = lift $ putStrLn "foo" -- fine
           insert $ undefined -- type error!
           return undefined

The specific error I get is:

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                           DataType -> m (Maybe DataType)
  at tools/clean-wac.hs:(33,1)-(34,34)
  `m' is a rigid type variable bound by
      the type signature for
        p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                      DataType -> m (Maybe (DataType))
      at tools/clean-wac.hs:33:1
Expected type: m (Key b0 val0)
  Actual type: b0 m0 (Key b0 val0)

Note that this might be due to wrong assumptions I made in designing the type signature. If I comment out the type signature and also remove the lift statement, the error message turns into:

No instance for (PersistStore ResourceT (SqlPersist IO))
  arising from a use of `p'
Possible fix:
  add an instance declaration for
  (PersistStore ResourceT (SqlPersist IO))
In the first argument of `CL.mapMaybeM', namely `p'

So this means that we can't access the PersistStore at all via ResourceT?

I cannot write my own Conduit either, without using CL.mapMaybeM:

dbAction = filterP
filterP :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => Conduit DataType m DataType
filterP = loop
    where loop = awaitE >>= either return go
          go s = do lift $ insert $ undefined -- again, type error
                    loop

This resulted in yet another type error I don't fully understand.

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             filterP :: (MonadIO m,
                                 MonadBaseControl IO (SqlPersist m)) =>
                                Conduit DataType m DataType
     `m' is a rigid type variable bound by
      the type signature for
        filterP :: (MonadIO m,
                            MonadBaseControl IO (SqlPersist m)) =>
                           Conduit DataType m DataType
Expected type: Conduit DataType m DataType
  Actual type: Pipe
                 DataType DataType DataType () (b0 m0) ()
In the expression: loop
In an equation for `filterP'

So, my question is: is it possible to use persistent like I intended to inside a conduit at all? And if, how? I am aware that since I can use liftIO inside the conduit, I could just go and use, say HDBC, but I wanted to use persistent explicitly in order to understand how it works, and because I like its db-backend agnosticism.

Was it helpful?

Solution

The code below compiles fine for me. Is it possible that the frameworks have moved on inthe meantime and things now just work?

However note the following changes I had to make as the world has changed a bit or I didn't have all your code. I used conduit-1.0.9.3 and persistent-1.3.0 with GHC 7.6.3.

  • Omitted parseBytes and serialise as I don't have your definitions and defined DataType = ByteString instead.

  • Introduced a Proxy parameter and an explicit type signature for the undefined value to avoid problems with type family injectivity. These likely don't arise in your real code because it will have a concrete or externally determined type for val.

  • Used await rather than awaitE and just used () as the type to substitute for the Left case, as awaitE has been retired.

  • Passed a dummy Connection creation function to withSqlConn - perhaps I should have used some Sqlite specific function?

Here's the code:

{-# LANGUAGE FlexibleContexts, NoMonomorphismRestriction,
             TypeFamilies, ScopedTypeVariables #-}

module So133331988 where

import Control.Monad.Trans
import Database.Persist.Sql
import Data.ByteString
import Data.Conduit
import Data.Conduit.Binary
import Data.Proxy

test proxy =
    withSqlConn (return (undefined "foo.db")) $ runSqlConn $ runResourceT $ 
         sourceFile "in" $= dbAction proxy $$ sinkFile "out"

dbAction = filterP

type DataType = ByteString

filterP
    :: forall m val
     . ( MonadIO m, MonadBaseControl IO (SqlPersist m)
       , PersistStore m, PersistEntity val
       , PersistEntityBackend val ~ PersistMonadBackend m)
    => Proxy val
    -> Conduit DataType m DataType
filterP Proxy = loop
    where loop = await >>= maybe (return ()) go
          go s = do lift $ insert (undefined :: val)
                    loop
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top