Processing concurrently in Scala

https://stackoverflow.com/questions/1007371

06-07-2019
|

Question

As in my own answer to my own question, I have the situation whereby I am processing a large number of events which arrive on a queue. Each event is handled in exactly the same manner and each even can be handled independently of all other events.

My program takes advantage of the Scala concurrency framework and many of the processes involved are modelled as Actors. As Actors process their messages sequentially, they are not well-suited to this particular problem (even though my other actors are performing actions which are sequential). As I want Scala to "control" all thread creation (which I assume is the point of it having a concurrency system in the first place) it seems I have 2 choices:

Send the events to a pool of event processors, which I control
get my Actor to process them concurrently by some other mechanism

I would have thought that #1 negates the point of using the actors subsystem: how many processor actors should I create? being one obvious question. These things are supposedly hidden from me and solved by the subsystem.

My answer was to do the following:

val eventProcessor = actor {
  loop {
    react {
      case MyEvent(x) =>
        //I want to be able to handle multiple events at the same time
        //create a new actor to handle it
        actor {
          //processing code here
          process(x)
        }
    }
  }
}

Is there a better approach? Is this incorrect?

edit: A possibly better approach is:

val eventProcessor = actor {
  loop {
    react {
      case MyEvent(x) =>
        //Pass processing to the underlying ForkJoin framework
        Scheduler.execute(process(e))
    }
  }
}

Solution

This seems like a duplicate of another question. So I'll duplicate my answer

Actors process one message at a time. The classic pattern to process multiple messages is to have one coordinator actor front for a pool of consumer actors. If you use react then the consumer pool can be large but will still only use a small number of JVM threads. Here's an example where I create a pool of 10 consumers and one coordinator to front for them.

import scala.actors.Actor
import scala.actors.Actor._

case class Request(sender : Actor, payload : String)
case class Ready(sender : Actor)
case class Result(result : String)
case object Stop

def consumer(n : Int) = actor {
  loop {
    react {
      case Ready(sender) => 
        sender ! Ready(self)
      case Request(sender, payload) =>
        println("request to consumer " + n + " with " + payload)
        // some silly computation so the process takes awhile
        val result = ((payload + payload + payload) map {case '0' => 'X'; case '1' => "-"; case c => c}).mkString
        sender ! Result(result)
        println("consumer " + n + " is done processing " + result )
      case Stop => exit
    }
  }
}

// a pool of 10 consumers
val consumers = for (n <- 0 to 10) yield consumer(n)

val coordinator = actor {
  loop {
     react {
        case msg @ Request(sender, payload) =>
           consumers foreach {_ ! Ready(self)}
           react {
              // send the request to the first available consumer
              case Ready(consumer) => consumer ! msg
           }
         case Stop => 
           consumers foreach {_ ! Stop} 
           exit
     }
  }
}

// a little test loop - note that it's not doing anything with the results or telling the coordinator to stop
for (i <- 0 to 1000) coordinator ! Request(self, i.toString)

This code tests to see which consumer is available and sends a request to that consumer. Alternatives are to just randomly assign to consumers or to use a round robin scheduler.

Depending on what you are doing, you might be better served with Scala's Futures. For instance, if you don't really need actors then all of the above machinery could be written as

import scala.actors.Futures._

def transform(payload : String) = {      
  val result = ((payload + payload + payload) map {case '0' => 'X'; case '1' => "-"; case c => c}).mkString
  println("transformed " + payload + " to " + result )
  result
}

val results = for (i <- 0 to 1000) yield future(transform(i.toString))

OTHER TIPS

If the events can all be handled independently, why are they on a queue? Knowing nothing else about your design, this seems like an unnecessary step. If you could compose the process function with whatever is firing those events, you could potentially obviate the queue.

An actor essentially is a concurrent effect equipped with a queue. If you want to process multiple messages simultaneously, you don't really want an actor. You just want a function (Any => ()) to be scheduled for execution at some convenient time.

Having said that, your approach is reasonable if you want to stay within the actors library and if the event queue is not within your control.

Scalaz makes a distinction between Actors and concurrent Effects. While its Actor is very light-weight, scalaz.concurrent.Effect is lighter still. Here's your code roughly translated to the Scalaz library:

val eventProcessor = effect (x => process x)

This is with the latest trunk head, not yet released.

This sounds like a simple consumer/producer problem. I'd use a queue with a pool of consumers. You could probably write this with a few lines of code using java.util.concurrent.

The purpose of an actor (well, one of them) is to ensure that the state within the actor can only be accessed by a single thread at a time. If the processing of a message doesn't depend on any mutable state within the actor, then it would probably be more appropriate to just submit a task to a scheduler or a thread pool to process. The extra abstraction that the actor provides is actually getting in your way.

There are convenient methods in scala.actors.Scheduler for this, or you could use an Executor from java.util.concurrent.

Actors are much more lightweight than threads, and as such one other option is to use actor objects like Runnable objects you are used to submitting to a Thread Pool. The main difference is you do not need to worry about the ThreadPool - the thread pool is managed for you by the actor framework and is mostly a configuration concern.

def submit(e: MyEvent) = actor {
  // no loop - the actor exits immediately after processing the first message
  react {
    case MyEvent(x) =>
      process(x)
  }
} ! e // immediately send the new actor a message

Then to submit a message, say this:

submit(new MyEvent(x))

, which corresponds to

eventProcessor ! new MyEvent(x)

from your question.

Tested this pattern successfully with 1 million messages sent and received in about 10 seconds on a quad-core i7 laptop.

Hope this helps.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow