The question almost answers itself. What is your bare metal application doing when it is not in that process/algorithm? Measure one or the other or both. If you have a bare metal application that is not completely consuming the cpu in this algorithm, then you already have an operating system to the extent that you are managing this application/function's time. You can use a number of methods from a simple counter in a loop relative to a timer to see how many counts per loop when the algorithm is getting time slices vs not. You can simply time the algorithm itself, etc.
I assume when you say CPU you mean the whole system as your performance is heavily dependent both on your code and what it is talking to. If running from flash on a cortex-m4 depending on the clock rate you may be burning processor cycles just waiting for instructions or data (and can very easily get the wrong notion of processor performance for an algorithm when it isnt the algorithm burning clocks). The caches mask/manipulate that performance and can easily greatly affect the performance if you are not careful and aware of what they are doing. Being a C++ question your compiler plays a large role in performance as well as your code of course, can very easily make the code run several times faster or slower with minimal changes to the command line or code.
If the algorithm is part of an isr then the processor goes to sleep otherwise, you can use the gpio pin and scope techinique to get a feel for the running vs sleeping ratio.