Based on your comments It seems that the problem is somewhere in your findBestMove function. BTW if you had an infinite loop, in one point the watchdog would trigger and most probably your driver would crash resulting with a black screen or a frozen one.
So I'd suggest that you comment all your code in your function and just assign the r, s, c variables a chosen value like the workitem id that handled these specific variables using the get_global_id function. Of course replacing the ??? with:
col_out[num] = c;
score_out[num] = s;
row_out[num] = r;
If you get the proper value, start debugging your function you'll know for sure your problem is in the function.
Since you asked for some tips here is one that i think will improve the performance (once you fixed your bug :)): instead of using the private memory for your scoreMat array use the local memory. Doing so you will avoid to make each thread accessing the same data in the global memory over and over (which is slow). To fetch the data from the global to the local memory you can use the async_work_group_copy function.
So in your case you'd have something like this:
local int scoreMat[64];
event_t ev = async_work_group_copy(lookup, scoreMat, 64, 0);
// Wait to make sure everything is copied
wait_group_events (1, &ev);
You might need to change some more code to take into account that you use now local memory. Basically it works the same way than the global one (from the access point of view) but it is much faster.
Note that the difference with what you have is that only one copy will be made not 60 (the number of workitems). Also this time the data you fetched from global are accessible from all the workitems within a workgroup. Before each workitem had it's own copy. It is also important to highlight the fact that is is within a workgroup. But since you are using only 60 workitems you most probably have only one workgroup.