Question

I'm reading cascading documentation chapter 5.2 Functions and I wonder what will happen with the following code. Should it work OK in multithreaded environment? The more general question is is the Function could be multithreaded? as I know the single mapper is single threaded.

In specific I've tested such code and it seems to me that this is not thread safe. Maybe I do not understand properly the documentation on page (39).

public class NotThreadSafeObject{ 
 ...
 public void doSomething(){
       // update state
 }
 public String getValue(){
       // returns value from state 
 }
public class SomeFunction extends BaseOperation<Tuple> implements Function<Tuple>
 {
    // constructors
   @Override
   public void prepare( FlowProcess flowProcess, OperationCall<Tuple> call )
   {
   // create a reusable Object with state of size 1
    call.setContext( new NotThreadSafeObject() );
   }

   public void operate( FlowProcess flowProcess, FunctionCall<Tuple> call )
   {
     // ...
     NotThreadSafeObject obj = call.getContext();
     obj.doSomething(); 
     Tuple tup = new Tuple();
     tup.set(0,obj.getValue());  
     call.getOutputCollector().add(tup);
   }

   @Override
   public void cleanup( FlowProcess flowProcess, OperationCall<Tuple> call )
   {
      call.setContext( null );
   }
}
Was it helpful?

Solution

Based on the Cascading documentation, this should work fine, and is in fact the primary reason to use the Context in a non-aggregating operation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top