Is python list comprehension using multi-threading or parallelized in any way by default?

https://softwareengineering.stackexchange.com/questions/313013

12-12-2020
|

Frage

When I write...

l2 = [my_computation(x) for x in l]

..., I wonder if python is applying my_computation to every x using all cores of my CPU.

If not, why ? Is there a simple way to make python parallelize list comprehensions?

Is this related to the behavior of map() or not at all?

Lösung

When I write...
l2 = [my_computation(x) for x in l]
..., I wonder if python is applying my_computation to every x using all cores of my CPU.

The Python Language Reference neither explicitly forbids nor explicitly forces asynchronous, concurrent, or parallel evaluation:

The comprehension consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and evaluating the expression to produce an element each time the innermost block is reached.

In fact, as you can see, it doesn't really give any evaluation semantics at all. The two sentences above are the entire specification of the behavior of comprehensions.

PEP202 – List Comprehensions is even less helpful.

If not, why ?

As you can see, an implementation is allowed, but not required to parallelize list comprehensions. However, parallelizing it is only correct if my_computation is referentially transparent. And figuring out whether it is referentially transparent, is equivalent to solving the Halting Problem, ergo impossible in the general case.

It is, of course, in specific cases sometimes possibly to decide referential transparency, however, at least on CPython, the most popular Python implementation, that won't help: CPython is incapable of executing Python code in parallel anyway.

Is there a simple way to make python parallelize list comprehensions?

Not that I know of. Their behavior is baked into the language, they don't translate into message sends, so there is nothing you could implement or override to change their behavior.

Compare this to Scala or C#, for example, which have a related feature, that is defined as a simple syntactic translation to message sends and thus can be overridden. E.g. in Scala, the equivalent would be

for (x ← l) yield myComputation(x)

and that would be translated to

l.map(x ⇒ myComputation(x))

So, you could change the behavior by providing your own map method, which is exactly what the Parallel Collections Framework which ships with the Scala Standard Library does.

Is this related to the behavior of map() or not at all?

No. If it were, then you could change it by providing your own map method.

The concurrent.futures module in the Python Standard Library, for example, provides its own concurrent map method.

Andere Tipps

Short answer: Nope.

There are two hurdles to that:

First is that list comprehensions are predicated on the idea that each item will be processed in order, the same way every time. Parallelizing the process would break code that depends on that behavior. That doesn't mean there couldn't be something written to break the work up and hand it to multiple threads in cases where that's not an issue, but the second hurdle makes that not worth doing.

Second is that Python doesn't actually multithread, which is fine for I/O bound jobs but provides no additional performance when there's computation to be done.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit softwareengineering.stackexchange