Why is lockless concurrency such a big deal (in Clojure)?

https://stackoverflow.com/questions/1360729

20-09-2019
|

Question

I'm told that Clojure has lockless concurrency and that this is Important.

I've used a number of languages but didn't realize they were performing locks behind the scenes.

Why is this an advantage in Clojure (or in any language that has this feature)?

Solution

I can't speak about Clojure specifically, but ... it means you don't need to wait for someone to be done with something before you can get to work. Which is great.

Typically it's achieved with immutable types. If nothing can be modified, you don't really need to wait till someone else is done with it before you can access it.

OTHER TIPS

Lockless concurrency also provides the nice advantage that readers never have to wait for other readers. This is especially useful when many threads will be reading data from a single source. You still need to define the data dependencies in your program and explicitly define the parts of a transaction that can be commuted safely.
STM saves you from deadlocks and almost all occurrences of livelock though it does not save you from concurrency failures you can still create cases where a transaction will fail because it lacks the resources to maintain its history but the important part is that concurrency failures will be explicit and you can recover from them

Deadlocks. Or to be more correct the lack of them.

One of the biggest problems in most languages is that you end up with deadlocks that are:

Hell on earth to debug.
Difficult to be sure you have gotten rid.

Now with no locks, obviously you won't run into deadlocks.

The biggest deal is that locks don't compose.

While it's trivial to write code with a simple locking strategy (e.g. put it in a synchronized Java class.....), it gets exponentially more complicated as you start to lock multiple objects, and start to create complex transactions that combine different locked operations. Deadlocks can occur, performance suffers, locking logic starts to make the code extremely convoluted and at some point the code starts to become unmaintainable.

These problems will become apparent to anyone who has to build a large, complex concurrent system (and solving them was a major motivation for Rich Hickey in creating Clojure).

The second issue is performance.

Both locking and STM clearly impose overhead. But in some important cases the STM overhead can be much lower.

In particular, lockless concurrency (as with Clojure STM) usually implies that readers are not impaired by any other threads (including writers!) if they access data outside a transaction. This can be a huge win in the fairly common case that reads don't need to be transactional and dramatically outnumber writes (think most web applications.....). Non-transactional reads of an STM reference in Clojure are essentially overhead free.

As long as you write strictly sequential programs (do A, then B, then C; finished!) you don't have concurrency problems, and a language's concurrency mechanisms remain irrelevant.

When you graduate from "programming exercise" programs to real world stuff, pretty soon you encounter problems whose solution is multi-threading (or whatever flavor of concurrency you have available).

Case: Programs with a GUI. Say you're writing an editor with spell checking. You want the spell checker to be quietly doing its thing in the background, yet you want the GUI to smoothly accept user input. So you run those two activities as separate threads.

Case: I recently wrote a program (for work) that gathers statistics from two log files and writes them to a database. Each file takes about 3 minutes to process. I moved those processes into two threads that run side by side, cutting total processing time from 6 minutes to a little over 3.

Case: Scientific/engineering simulation software. There are lots and lots of problems that are solved by calculating some effect (heat flow, say) at every point in a 3 dimensional grid representing your test subject (star nucleus, nuclear explosion, geographic dispersion of an insect population...). Basically the same computation is done at every point, and at lots of points, so it makes sense to have them done in parallel.

In all those cases and many more, whenever two computing processes access the same memory (= variables, if you like) at roughly the same time there is potential for them interfering with each other and messing up each others' work. The huge branch of Computer Science that deals with "concurrent programming" deals with ideas on how to solve this kind of problem.

A reasonably useful starting discussion of this topic can be found in Wikipedia.

The benefit of lockless concurrency is the lack of complexity in the program. In imperative languages, concurrent programming relies on locks, and once the program gets even moderately complex, difficult-to-fix deadlock bugs creep in.

Such "lockless concurrency" isn't really a feature of a language; rather, it's a feature of a platform or runtime environment, and woe be the language that won't get out of the way to give you access to these facilities.

Thinking about the trades between lock-based and lock-free concurrency is analogous to the metacircular evaluator problem: one can implement locks in terms of atomic operations (e.g. compare-and-swap, or CAS), and one can implement atomic operations in terms of locks. Which should be at the bottom?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow