Question

Is there any way to increase a time interval, on the basis of which the RTS decides that thread has blocked indefinitely in an STM transaction? Here is my code:

import Control.Concurrent (ThreadId)
import Control.Concurrent.MVar (MVar,newMVar,withMVar)
import Control.Concurrent.STM
import qualified Control.Concurrent.ThreadManager as TM

data ThreadManager = ThreadManager { tmCounter::TVar Int, tmTM::MVar TM.ThreadManager }

data Settings = Settings {
    maxThreadsCount::Int }

createThreadManager :: Settings -> IO ThreadManager
createThreadManager s = do
    counter <- atomically $ newTVar (maxThreadsCount s)
    tm <- TM.make >>= newMVar
    return $ ThreadManager counter tm

forkManaged :: ThreadManager -> IO () -> IO ThreadId
forkManaged tm fn = do
    atomically $ do
        counter <- readTVar $ tmCounter tm
        check $ counter > 0
        writeTVar (tmCounter tm) (counter - 1)
    withMVar (tmTM tm) $ \thrdmgr -> TM.fork thrdmgr $ do
        fn
        atomically $ do
            counter <- readTVar $ tmCounter tm
            writeTVar (tmCounter tm) (counter + 1)

forkManaged makes sure that amount of simultaneously running managed threads does not exceed maxThreadsCount. It works fine until heavy load. Under heavy load RTS throws an exception. I think under heavy load, on hard concurrent competition for resources, some of threads just have no time to get access to the STM context. So I think, increasing time interval when RTS decides to throw this exception may solve the problem.

Was it helpful?

Solution

Daniel Wagner is right. The decision is not made with timeouts. The relevant code in the rts is in Schedule.c

See the resurrectThreads function for where the exception is thrown. The comment describes that this is only thrown to threads found to be garbage after GC. ezyang described how this worked for mvars: http://blog.ezyang.com/2011/07/blockedindefinitelyonmvar/

[bad speculation concerning check removed when I checked its source and realized that it was just a simple guard/retry and not what was described in an earlier paper -- oops! I now suspect that Daniel Wagner is correct here as well, and the issue is somehow that the counter isn't being incremented.]

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top