A not-so-robust solution is to use a host_ptr variable, using the CL_MEM_ALLOC_HOST_PTR flag, let the kernel to modify the pointer, then on the host side, use a while loop to test this pointer and use the value to print progress bar,
here is the declaration
https://github.com/fangq/mcxcl/blob/master/src/mcx_host.cpp#L428-L431
here is the update inside the kernel
https://github.com/fangq/mcxcl/blob/master/src/mcx_core.cl#L845-L848
here is the host-side value retrieval and progress bar printing
https://github.com/fangq/mcxcl/blob/master/src/mcx_host.cpp#L583-L606
this works ok on AMD GPUs (the update is somewhat sparse, the progress variable is only updated a few times during the kernel runtime, causing non-even jump in the progress bar). however, for nvidia and intel devices, this does not print anything until the kernel is complete.
try my code here
git clone https://github.com/fangq/mcxcl.git
cd mcxcl/src
make clean all
cd ../example/quicktest
./run_qtest.sh -D P
I asked this question on NVIDIA's forum, but no one knows how to fix it for nvidia.
https://devtalk.nvidia.com/default/topic/1031335/cuda-programming-and-performance/how-to-update-host-memory-variable-from-device-to-host-during-kernel-execution-in-opencl/