Can I use Scala's parallel collections when I have several expensive operations I want to call on the same input and then collect the results?

StackOverflow https://stackoverflow.com/questions/7869171

  •  11-02-2021
  •  | 
  •  

Question

I found a similar question but it has what seems to be a simpler case, where the expensive operation is always the same. In my case, I want to collect a set of results of some expensive API calls that I'd like to execute in parallel.

Say I have:

def apiRequest1(q: Query): Option[Result]
def apiRequest2(q: Query): Option[Result]

where q is the same value.

I'd like a List[Result] or similar (obviously List[Option[Result]] is fine) and I'd like the two expensive operations to happen in parallel.

Naturally a simple List constructor doesn't execute in parallel:

List(apiRequest1(q), apiRequest2(q))

Can the parallel collections help? Or should I be looking to futures and the like instead? The only approach I can think of using parallel collections seems hacky:

 List(q, q).par.zipWithIndex.flatMap((q) =>
   if (q._2 % 2 == 0) apiRequest1(q._1) else apiRequest2(q._1)
 )

Actually, all things being equal, maybe that isn't so bad...

Was it helpful?

Solution

Why don’t you write

List(apiRequest1 _, apiRequest2 _).par.map(_(q))

OTHER TIPS

Quick and dirty solution:

scala> def apiRequest1(q: Query): Option[Result] = { Thread.sleep(1000); Some(new Result) }
apiRequest1: (q: Query)Option[Result]

scala> def apiRequest2(q: Query): Option[Result] = { Thread.sleep(3000); Some(new Result) }
apiRequest2: (q: Query)Option[Result]

scala> val f = List(() => apiRequest1(q), () => apiRequest2(q)).par.map(_())
f: scala.collection.parallel.immutable.ParSeq[Option[Result]] = ParVector(Some(Result@1f24908), Some(Result@198c0b5))

I m not sure it would actually work in parallel if you have only two or a small numbers of calls, there is a threshold for parallelization, and it would probably work sequentially with so small a collection, on the ground that it is not worth the parallelization overhead (it can't know that as it depends on the the operation you want to run, but it is reasonable to have a threshold on collection operations).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top