سؤال

Can't understand what's the right way to compute elements of list in parallel, but block main thread while elements are not computed (in parallel). Use case: i have a list of URL links and a simple parser for html page, i what to reduce the amount of time needed to grab info from the given pages by parsing each page in parallel and then return a simple list with some JSON data.

As i understand i have two options:

Concurrent way with Futures

I have a method with extract some JSON data in Future:

def extractData(link: String): Future[JValue] = // some implementation

and i just map it over a list of links, which type would be List[Future[JValue]]:

val res: List[Future[JValue]] = listOfLink.map(extractData)

If i call sequence (for example from Scalaz, or my own implementation) which traverse this list and convert it to Future[List[JValue]], then links still gonna be processed sequentially, but a separate thread, which won't give me any efficiency, cause in result i need to get a List[JValue].

Try to compute with ParSeq

In this option i have a function which just extracts data:

def extractData(link: String): JValue = // some implementation

but this time call .par on the collection:

val res: ParSeq[JValue] = listOfLinks.map(extractData)

But in this way i don't quite understand how to block main thread while the hole list won't be computed, without parsing each link sequentially

As for the Akka, i just can't use actors here, so only Future or Par*

هل كانت مفيدة؟

المحلول

The links will be processed in parallel when you map extractData over the collection. Consider a slightly simplified example:

import scala.concurrent._
import ExecutionContext.Implicits.global

def extractData(s: String) = future {
  printf("Starting: %s\n", s)
  val i = s.toInt
  printf("Done: %s\n", s)
  i
}

val xs = (0 to 5).map(_.toString).toList

val parsed = Future.sequence(xs map extractData)

Now you'll see something like the following, which makes it clear that these things aren't being processed sequentially:

Starting: 0
Done: 0
Starting: 2
Done: 2
Starting: 1
Starting: 4
Done: 1
Starting: 3
Starting: 5
Done: 5
Done: 4
Done: 3

Note that you can use Future.traverse to avoid creating the intermediate list of futures:

val parsed = Future.traverse(xs)(extractData)

In either case you can block with Await:

val res = Await.result(parsed, duration.Duration.Inf)

As a footnote: I don't know if you're planning to use Dispatch to perform the HTTP requests, but if not, it's worth a look. It also provides nicely integrated JSON parsing, and the documentation is full of useful examples of how to work with futures.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top