#pragma unroll
is the only mechanism for requesting unrolling that is documented in the CUDA C Programming Guide 5.5, and it must be specified before each loop. But the compiler unrolls all "small loops with a known trip count" by default, so you may not need the unroll directives in your first example.
I don't think controlling unrolling at the function level would be all that useful. You should probably initially rely on the compiler to select the best amount of unrolling and then tweak each loop separately if profiling indicates that it could help.