If I understand compile flow of TH correctly, the ordinary haskell functions are being executed while splicing at compile time. But you can run then at the runtime on your own, of course.
For example you have something like $(foo x y ...) in your TH-heavy file. Create another file and call 'foo x y' there but don't splice the result. Then you'll be able to profile 'foo' as usual. If the bottleneck is at the AST generation stage you'll locate it. Don't forget to consider lazyness.