Question

I have a large files where I store Binary data. There are multiple threads reading and writing these files, an my current design synchronizes them using a single Lock. This way, I have only one Handle in ReadWriteMode open for a file, and all threads fight for that single lock when they feel like doing some I/O.

I'd like to improve upon this by allowing multiple readers to work concurrently. What I tried was using a RWLock and having multiple Handles open. The RWLock would ensure that only one thread modifies the file, while many threads (as many as I have handles open, a compile-time constant) are allowed to read concurrently. When trying to run this, I was hit by the fact that the runtime allows only one Handle in ReadWriteMode to exist for a file at any time.

How can I resolve this situation? I assume obtaining / releasing a Handle is an expensive operation, so just opening the file in the appropriate mode after acquiring the RWLock is not really an option. Or maybe there is a package offering an API similar to Java FileChannel's read and write methods?

PS: I'd like to support 32bit architectures, so memory-mapped IO is not possible for files > 4GiB, right?

Was it helpful?

Solution 2

So your problem is that you don't want to use stateful Handles (where the state is the current location in file)? In that case, you need pread and pwrite, I guess.

man pread

For Haskell binding: http://hackage.haskell.org/package/unix-bytestring-0.3.7.2/docs/System-Posix-IO-ByteString.html

For a usage example, you can look here: https://github.com/errge/PrefetchFS/blob/master/PrefetchHandle.hs

OTHER TIPS

You should build an type around the file handle and a mutex lock. Here's a simple implementation that I think would work for your purposes.

module SharedHandle (SharedHandle, newSharedHandle, withSharedHandle) where

import Control.Concurrent.MVar
import System.IO              

data SharedHandle = SharedHandle Handle (MVar ())

newSharedHandle :: IO Handle -> IO SharedHandle
newSharedHandle makeHandle = do
    handle <- makeHandle
    lock <- newMVar()
    return $ SharedHandle handle lock

withSharedHandle :: SharedHandle -> (Handle -> IO a) -> IO a
withSharedHandle (SharedHandle handle lock) operation = do
    () <- takeMVar lock
    val <- operation handle
    putMVar lock ()
    return val

What's doing on here is I've created a new datatype which is, at it's essence, just a file handle. The only difference is that it also comes with its own individual mutex lock implemented with an MVar. I have provided two functions for operating on this new type. newSharedHandle takes a operation that would create a normal handle and created a shared handle with a fresh lock. withSharedHandle takes an operation for operating on handles, locks the shared handle, performs the operation, and then unlocks the handle. Notice that the constructor or accessors are not provided from the module so we can be assured no process ever forgets to free the lock and we never get deadlocks on one particular access.

Replacing all file handles in your program with this new type could solve your problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top