If you are in so time-critical a part of your program that it worries you whether it's faster to copy 24 bytes or 8 bytes + sharing overhead, you will have to implement both and use a profiler to find out which is better in your particular scenario. And in such case, you should also consider an entirely different approach which would bypass the virtual call probably hidden inside std::function::operator()
.
Otherwise, I'd just take the parameter by value and let the optimiser do its job.