It looks like callgrind_annotate is meant to do exactly this with the data generated by callgrind. The reason it does not show "baz" given the trivial example is simply that your test code executes so quickly that the execution time spent within it pales in comparison to the time spent within the overhead code (e.g. dynamic-library loading code).
You can get callgrind_annotate to include your baz either by using the threshold parameter:
callgrind_annotate --threshold=100 --tree=both callgrind.out.3519 | grep baz
Or by altering the example:
int main(){
for(int i=0;i<1000000;i++9 {
foo();
}
return 0;
}