Question

I currently have Matlab code that does the following:

map = collections.Map;
for i = 1:N
    key = getKey(i);
    if isKey(map, key)
        % Return the value stored at key.
    else
        % Calculate a new value, store it in the map using key.
    end
end

This loop takes a long time to run, and I'd like to use parfor in order to help improve efficiency. However, I seem to be unable to assign a value to the map inside the parfor loop. Any ideas as to what I can do? The usage of collections.Map isn't firm, I'm open to alternate suggestions for parallelized memoization, as long as they are fast and efficient (and threadsafe, I realize that Map probably isn't).

ADDED FROM COMMENT BELOW: I'm hoping for more of a thread-safe way to add new values to the map during the loop, so that that any subsequent loops can used the precalculated value. The calculation is pretty expensive time-wise.

Was it helpful?

Solution

"Thread-safe" is the wrong concern here. In general, "thread-safe" does not apply to Matlab objects, because Matlab is single-threaded at the M-code level. There is only ever one Matlab interpreter thread per Matlab instance/process. Parallelized parfor iterations happen in separate processes or even on separate machines, not separate threads, so this would be more of an interprocess communication issue than multithreading. And you can't have two-way "communication" between the states of the parfor worker loop passes and each other or the enclosing workspace through assignment; I'm pretty sure the data transfer only occurs at the loop start and loop end.

If you really want memoization inside a parfor, you need a client-server memoization mechanism of some sort. What you would need to do is set up your memoization cache in a process on a server somewhere that all the Matlab pool workers can see, but external to the Matlab workspaces involved in your M-code, and use a client proxy object inside the loop to query the cache through interprocess communication of some sort, like RMI or socket calls. The main "server" cache object could even be inside the master Matlab process that's running the code with the parfor loop as long as it's a Java or C structure that's not managed by the M-code interpreter. The server side of the cache would handle safe concurrent access (maybe with a threadsafe map and multiple worker threads, maybe by serializing requests). The client side could have a locally stateful proxy as long as it wasn't based on M-code indexing to update its elements.

More generally, if you want interaction between the workers in a parfor pass, you need a shared data store like a database or filesystem thing that they can all access.

Maybe you could just set up a little memcached instance, or a lightweight in-memory database (maybe Derby, which ships with the JDK?), and use that as the server-side cache, creating db handles in all the workers to access it. Or maybe there's a lighter-weight set of Java objects that presents a thin RMI interface to a ConcurrentMap or synchronized Map? You could also make a cache server with Matlab objects like containers.Map by having a separate Matlab process act as the cache server, listening on a socket for requests and serializing them, and having the parfor workers access it via a client proxy object that had get() and set() methods that used IPC calls on the cache server process to do their work.

This is all probably going to be a good amount of overhead, so it'll only be worthwhile if the calculation really is really expensive.

OTHER TIPS

Obviously you refer to containers.map. I would use temp arrays/cell-arrays to store the product of the parfor and assign everything new after the loop.

% original map
keySet = {'a','b','c','f'};
map = containers.Map(keySet, 1:length(keySet));

% simulates your keys/generated values 
getKey = {'f','g','h'};
getVal = (5:7);

% Parfor body
parfor i = 1:3
    key = getKey{i};
    if isKey(map, key)
        % Return the value stored at key.
        map(key);
    else
        % Calculate a new value, store it in a temp array/cell-array 
        keyNew{i} = key;
        valNew(i) = getVal(i);
    end
end

% assign to map the new pairs of keys/values
indNew = ~cellfun(@isempty, keyNew); % clean up empty cells
newMap = containers.Map(keyNew(indNew), valNew(indNew));
map = [map; newMap];

Edit: You cannot use parfor and dynamically make values available to subsequent iterations, simply because there is no such thing as subsequent iterations in a parfor loop.

From the parfor documentation:

Note: Because of independence of iteration order, execution of parfor does not guarantee deterministic results.

What this means (as this SO question/answer nicely highlights), is that iterations in the parallel for execution are AND must be considered independent, as opposed to a regular for loop. As a result, their order of execution/finish is not predictable and their access/dependency on the output of other iterations is not feasible.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top