deref inside a transaction may trigger a retry - what is the role of ref state history?

Question

Clojure's STM does not care about the present. By the time an observation is made, the present has already moved. Clojure's STM only cares about capturing a consistent snapshot of state.

This is not very obvious from the example because we know a single read would always be a consistent snapshot. But, if you are only ever using dosync on a single ref, then you probably shouldn't be using refs at all, but atoms instead.

So, imagine instead we are reading from an a and a b and trying to return their sum. We don't care that a and b are current when we return the sum -- trying to keep up with the present is futile. All we are about is that a and b are from a consistent period of time.

If while in a dosync block, we read a and then b but b was updated in between the two reads, we have an a and b from inconsistent points in time. We have to try again -- start all over again and try to read a then b from the near present.

Unless... Suppose we kept a history of b for every change to b. As before, suppose we read a and then b but an update to b occurs before we're done. Since we saved a history of b, we can go back in time to before b changed and find a consistent a and b. Then, with a consistent a and b from the near past, we can return a consistent sum. We don't have to retry (and potentially fail again) with new values from the near present.

Consistency is maintained by comparing a snapshot taken when entering dosync to a snaphshot when exiting. Under this model, any change to the relevant data in between would require a retry. The default is optimistic that this will be the case. When a failure occurs, it is marked on the applicable ref so the next time a change is made a history is kept. Now, consistency is maintained whenever the snapshot taken when entering can be compared to a snapshot when exiting or the single past history retained. So, now a single change on that ref during the dosync will not cause a failure. Two changes still will because the history will be exhausted. If another failure does occur, this is marked again and now a history of length two is maintained.

With the example, pretend that we are trying to coordinate multiple refs. The default initial history length is 0 with a maximum of 10.

(defn stm-experiment 
  [min-hist max-hist] 
  (let [a (ref 0 :min-history min-hist :max-history max-hist)] 
    (future (dotimes [_ 500] (dosync (Thread/sleep 20) (alter a inc)))) 
    (dosync (Thread/sleep 1000) @a)))

So the default would be

(stm-experiment 0 10)
;=> 500 (probably)

The updates to a occur every 20 milliseconds and the read occurs after 1000 milliseconds. Therefore, 50 updates to a occur before each attempted read. The default tunings of min-history and max-history is that optimistically 0 updates will happen to a and that at most 10 will. That is, we start with no history on a and each time a failure occurs, we grow the history of a one longer, but only up to 10. Since 50 updates are occuring, this will never be enough.

Compare to

(stm-experiment 50 100)
;=> 0 (quite possibly, multicore)

With a history of 50, all 50 changes to a are kept in a history, therefore the state of a that we captured on entry is still there at the very end of the history queue upon exit.

Try also

(stm-experiment 48 100)
;=> 100 (or thereabouts, multicore)

With an initial history length of 48, the 50 changes to a will cause the history to be exhausted and a read fault. But, this read fault will lengthen the history to 49. This still isn't enough, so another read fault occurs and the history is lengthened to 50. Now an a consistent to the a at the beginning of the dosync can be found in the history and success occurs after two attempts during which a was updated 50 x 2 = 100 times.

Finally,

(stm-experiment 48 48)
;=> 500

With a cap of 48 on the history length, we can never find the value of a we started with before 50 updates occured.