Unless you change the processing the gprof would work fine.
Changing the processing means using co-processor or gpus as computing units. In the worst case you have to manually call the setitimer
function for every thread. But as per latest version, (2013-14) it's not needed.
In certain cases it behaves mischievously. So I advice to use the VTUNE from Intel which would give more accurate and more detailed information.