Question

according to the official docs there are two options to create parallel collections:

1)

// There's a little bug here, doesn't matter for the sake of the question
import scala.collection.parallel.mutable.ParArray
val pv = new ParVector[Int]

2)

val pv = Vector(1,2,3,4,5,6,7,8,9).par

Now, what are the differences? Does exist any performance penalty when I convert it from a simple sequential collection?

What would you do if you've to create a bit parallel collection (say, several thousand elements), would you create it from scratch or convert it?

Thank you guys!

EDIT:

As @oxbow_lakes says there's a piece of docs that focus on this topic, but i'm trying to get "experienced advices". I mean, what would YOU do if you have to read a big collection from a DB, for instance.

Was it helpful?

Solution

Depends on the collection. Vector is basically free, ParVector is just a wrapper around the vector. Same for Arrays. Others, e.g. List, will have to be completely copied in a different structure, more amenable to parallelism. And then copied back to a new list if you want your result to be a List too.

You may have a look at this brand new guide on the scala documentation site, section Creating a parallel collection.

OTHER TIPS

The official documentation for the par method says:

For most collection types, this method creates a new parallel collection by copying all the elements. For these collection, par takes linear time [...]

Specific collections (e.g. ParArray or mutable.ParHashMap) override this default behaviour by creating a parallel collection which shares the same underlying dataset. For these collections, par takes constant or sublinear time.

That is, in general the operation in O(n), except when using the mutable collections ParArray and ParHashMap, where it is less that O(n) - but possibly not constant time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top