Question

I have three data structures defined as such, where S, LL, M, and Object, represent Set, ListLike, Map, and ByteString, respectively:

nouns :: IO [Object]
nouns = liftM LL.words $ B.readFile "nounlist.txt"

obj :: IO ObjectSet
obj =  liftM S.fromList nouns

actions :: IO ActionMap
actions = do
  n <- nouns
  let l = foldl' (\z x -> (x,Sell):(x,Create):z) [] n
  return $ M.fromList $
    (\(x,y) -> ((x, Verb y []), Out (Verb y []) x)) <$> l

Now I have one function that binds the unevaluated Set and Map to variables a and o. Once it enters query, an infinite loop of queries are accepted via user-input and processed. Appropriate responses are generated via lookups.

process :: IO ()
process = do
  a <- actions
  o <- obj
  forever $ query "" a o

Keeping in mind that my Map is composed of 300,000+ key-value pairs: The initial temporal overhead of the first evaluation when the first query is called is between approximately 3-5 seconds, on my computer; this is fine and completely expected. Every other subsequent call is snappy and responsive, just the way I want it. However, this is only so because I am running this code as a standalone executable and have the luxury of staying within the IO () of process. If I were to turn this code (and the rest of the accompanying code not listed) into a library to interface with say .. a Snap Framework Web Application, I wouldn't necessarily have this luxury. Essentially what I am trying to say is: If I were to remove the forever from process then the evaluated Map and Set would surely get garbage-collected. Indeed this is what happens when I call the function from a Snap Application (I can't keep forever because it will block the Snap Application). Every subsequent call from the Snap Application will have the same 3-5 second overhead because it re-evaluates the data structures in question.


My Question:

Is there an easy way to hold the Map and Set in memory so that every subsequent lookup is fast? One Idea I came up with was to run a thread that sleeps and maintains storage for the Map and Set. However, this definitely seems like overkill to me. What am I overlooking? Thank you for bearing with my long-winded explanation.

Note: I'm not necessarily looking for code answers, moreso suggestions, advice, etc.

Was it helpful?

Solution

You can evaluate obj and actions only once during snaplet initialization and store result in snaplet's state.

data SnapApp = SnapApp
    { objectSet :: ObjectSet
    , actionMap :: ActionMap
    }

appInit :: SnapletInit SnapApp SnapApp
appInit = makeSnaplet ... $ do
    ... 
    a <- liftIO actions
    o <- liftIO obj
    return $ SnapApp o a

Now you can access them from snap's Handler:

someUrlHandler :: Handler SnapApp SnapApp
someUrlHandler = do
  a <- gets actionMap
  o <- gets objectMap
  res <- query a o
  ...

This guarantees that actions and obj will be evaluated only once.

OTHER TIPS

Here is what I was think of doing with IORef:

import Data.IORef
import System.IO.Unsafe 
import Control.Monad 

val_ :: IORef (Maybe Integer)
val_ = unsafePerformIO $ newIORef Nothing

val :: IO Integer
val = do 
  v <- readIORef val_
  case v of 
    Just v' -> return v' 
    Nothing -> do
           v' <- readFile "large.txt" 
           -- replace this part with your actual computation
           let l = sum $ map (fromIntegral . fromEnum) v' 
           writeIORef val_ $ Just l
           return l 

main = do 
  writeFile "large.txt" (replicate (10^7) '0')
  putStrLn "reading"
  replicateM_ 10 (val >>= print)

You ensure that the time consuming operation is only ever evaluated once. When you execute val the first time, it will write the value to the IORef and retrieve it from there every subsequent time. When I ran main, it will take a few seconds to print the number the first time and no time at all afterwards.

You need to have unsafePerformIO because IORef x can't be garbage collected but IO (IORef x) will be.

Keep in mind that writing to the IORef does not evaluate anything, it will be evaluated the first time it is used, even if you call val earlier.

The simpler solution is probably to use monad transformers. You didn't provide an example of where in your snap program this table will be used, so I can't really give a satisfactory example.

I do believe TVar could do it.

import Control.Concurrent.Concurrent

intial=do
    objects <- newTVarIO Nothing
    --I didn't understand your example code well, bear with me (and fix this.)
    queryMachine <- mkQueryMachine objects
    return queryMachine

queryMachine objects=QueryMachine $ do
    objects''' <- atomically $ do
        objects' <- readTVar objects
        case objects' of
            Nothing -> do
                let objects'' = objectsMaker
                writeTVar objects $ Just objects''
                return objects''
            Just objects'' -> return objects''

    profitFrom objects'''

Adapt to your own needs.

Explanation

TVar is a mutable variable in the STM monad. STM is quite thread safe. The atomically converts it to an IO action. For multiple types of things you are doing this for, create separate STM actions, and call atomically on each one. That is because you want atomically blocks to be small, so that the locks they take don't take too many "locks" (sort of.) The above code can be improved by, say, having a TSem from Control.Concurrent.STM.TSem, that would make sure only one thread would try calculating it, in case two requests come in at the same time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top