Question

Let's say I have a somewhat large (several millions of items, or so) list of strings. Is it a good idea to run something like this:

val updatedList = myList.par.map(someAction).toList

Or would it be a better idea to group the list before running ...par.map(, like this:

val numberOfCores = Runtime.getRuntime.availableProcessors
val updatedList = 
  myList.grouped(numberOfCores).toList.par.map(_.map(someAction)).toList.flatten

UPDATE: Given that someAction is quite expensive (comparing to grouped, toList, etc.)

Was it helpful?

Solution

Run par.map directly, as it already takes the number of cores into account. However, do not keep a List, as that requires a full copy to make into a parallel collection. Instead, use Vector.

OTHER TIPS

As suggested, avoid using lists and par, since that entails copying the list into a collection that can be easily traversed in parallel. See the Parallel Collections Overview for an explanation.

As described in the section on concrete parallel collection classes, a ParVector may be less efficient for the map operation than a ParArray, so if you're really concerned about performance, it may make sense to use a parallel array.

But, if someAction is expensive enough, then its computational cost will hide the sequential bottlenecks in toList and par.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top