The easiest solution is to imagine there are multiple steps in each round. Let's say there are N threads.
step 1: each thread makes a list of cells it needs to discover. It puts the "question" in one of the N queues that there are (one for each thread).
wait for all the threads to finish
step 2: each thread fill the responses for its queue of question
wait for all the threads to finish
step 3: each thread computes the new state of its region
wait for all the threads to finish