Question

The original question

I'm new to STM. One thing I'd like to do in Haskell involves a big piece of data, and lots of lightweight threads reading and writing to small parts of said big piece of data. The locations read from and written to can be considered essentially random and small. STM seems great for this, but I have some questions regarding how to attack the problem. I see several possible approaches, each with some drawbacks, and some that seem plain stupid. Some comments on these, or other ideas for alternatives, would be much appreciated.

Let's for simplicity assume that the big piece of data is a Data.Vector a, where the elements are themselves small.

  1. The entire piece of data as one TVar (Vector a). I guess this will cause lots of copying of the huge piece of data, as STM will think each individual write has possibly affected the entire shared data. Surely there's no magic going on where STM identifies that the reads and writes are very localized, and that consistency across the big piece of data is not required?

  2. A huge amount of TVar as, essentially one for each element, giving fully localized STM, but essentially duplicating the entire Vector a. This seems stupid.

  3. A compromise between 1 and 2 by segmenting the data so that I have a reasonable number of TVar (Vector a)s corresponding to subvectors of the data. I feel this solution depends too much on heuristics, like how big the segments should be.

  4. Messaging. Instead of each worker reading and writing to the data using STM, each writes messages with requests for data to be read or chunks of data to be written through some STM mechanism, such as a TChan. A special thread receives these messages, passing out data requested through another TChan, or taking received data and writing it to the shared data structure. This solution seems free of the problems plaguing solutions 1-3, but it also seems to me that it essentially gives up on using the niceties of STM to keep the data consistent. Rather, it's just message passing. Granted, the message passing part is implemented with STM, but my real problem is solved with message passing. STM seems so great, message passing is so... meh.

Am I thinking about this correctly? Does anybody have any hints or other suggestions?

Keep in mind that I have no experience with STM, nor have I tried the above solutions yet. I'll get out of my armchair, but sometimes it can be good to think about these things before trying anything.

Addendum: A fifth approach comes from Nathan Howell and uses TArray. It sounds like what I want, but the documentation says:

It is currently implemented as Array ix (TVar e), but it may be replaced by a more efficient implementation in the future (the interface will remain the same, however).

I take this to mean that TArray is just my approach number 2 in nicer clothes. The docs hinting about "a more efficient" implementation is interesting, as it hints of there actually being a nicer approach.

Follow-up on Vagif Verdi's answer

Vagif Verdi's answer is very interesting, so I made a little experiment to try it out. I don't have time to slim down the code right now, so those interested in this will have to bear with me on the code not just containing the bare essentials. I decided to use a mutable vector with 10^8 Ints as the "big shared piece of data", and let the multiple readers/writers correspond to threads taking into on a network socket.

Note that the code does not even read or write to the shared piece of data just. It's simply there, and each thread holds a TVar to it.

So what happens? I run the program, and immediately it takes up about 780 MB of RAM, which is to be expected (it's roughly what the 10^8 Ints need). Now, if I use netcat to connect a few clients and write a bit of text, which the program is just supposed to print out and not even write to the shared data, the process' CPU usage spikes to 100% for upwards of a second! There's a noticeable lag before the text is displayed. On the bright side, memory usage stays constant (as per Vagif Verdi's answer). If I remove the vector and the TVar, i.e. take out all STM and shared data, everything is very quick and responsive, and there's no noticeable CPU usage whenever a client writes something.

So, while it's very nice to see that the shared data is in fact not duplicated (although I suppose I should actually write to the shared data in order to verify this fully), there's a very heavy performance penalty involved in maintaining coherent state. For me, my question then remains: How should one properly attack this problem while keeping the niceties of STM?

Thanks to Vagif Verdi for bringing up some very interesting points.

Was it helpful?

Solution

STM isn't magic. If you have one giant TVar it is going to have to block all threads but one at any given time. This is equivalent to "coarse grained locking" and has the benefit that it is easy to use, but the whole point of STM is to not be forced into using coarse grained locking. Effectively only one thread can write at a time with this approach. The same goes with your "message passing" system--one thread is a bottleneck limiting scalability.

I don't think there is a big problem with using an array of TVars and I don't know why you describe your approach 2 as "stupid". This is exactly what STM was invented to do.

Edit: I encourage interested parties to watch this video, or at least the beginning of it, for a discussion of some of the motivation for STM. It is a few years old, and the stuff on Transactional Boosting isn't really relevant, but Herlihy is brilliant and one of the Computer Scientists who manages to make an area interesting even if it isn't your thing.

OTHER TIPS

First of all, Vector is an immutable data structure. Any time you "modify" the Vector, you're creating a whole new copy of it, meaning each modification takes O(n) time (where n is the length of the vector).

Second of all, values in Haskell are immutable. Any time you modify a TVar, you are replacing the old value with a new value.

I think what you want is a purely functional data structure supporting efficient update. Two examples:

  • Data.Map: Key-value dictionary. This is like std::map in C++.

  • Data.Sequence: Like a mutable array, but better.

Any time you "modify" one of these data structures, you're actually constructing a new value that internally points to parts of the old value.

A couple guidelines:

  • If you're only modifying a single value, atomicModifyIORef may be sufficient. If more sophisticated synchronization is needed besides atomic update, STM would be a better choice.

  • Watch out for laziness when working with mutable variables. Any time you modify the shared state, be sure to force it. This can be done using seq. A more convenient way is to use bang patterns. Example:

    !x <- atomically $ do
        x <- readTVar shared_state
        writeTVar shared_state (changeSomething x)
        return x
    

    This forces x to be evaluated after the transaction completes. If a variable (IORef, STRef, TVar, etc) is modified a bunch of times but never forced, thunks will pile up in memory. Evaluating the resulting thunk may even produce a stack overflow.

If your program needs to be massively parallel (that is, have multiple cores/CPUs access the variable simultaneously), updating a single value like this may be less efficient, as computations may be duplicated among processors. However, for a small number of cores, duplication is pretty rare.

Haskell has several implementations of mutable arrays / vectors. So you can go with the simplest approach TVar (Vector a) and not worry much about copying overhead (there is no copying in mutable arrays)

Here's one such library: http://hackage.haskell.org/package/vector-0.9.1

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top