On a reasonably modern process, dividing by 15 shouldn't be that terrible. The AMD optimisation guide defines it based on the quotient (the value that is being divided), and it takes 8 + bit position of the most significant bit of the quotient. So if your numbers have the 63rd bit set, you end up with 71 cycles - which is quite a long instruction, of course. But for a 32-bit number with a few zeros in the top bits, we're talking 30-40 cycles. If the number fits in a 16-bit value, we the maximum is 23 cycles.
To get the remainder is one more clockcycle on top of that.
If you are doing this ALL the time, of course, you may find that this time is quite long, but I'm not sure there is a trivial way to avoid it.
Like others have said, compiler may be able to replace it with something better. But 15 does not, to my knowledge have an obvious fast solution (if you have 16 instead of 15, then we can use the trick of x & 15
).
If it's a limited range, you could build a table [vector<bool>
for example, which will store 1 bit per entry], but you'll quite soon run into the problem that the non-cached memory access takes just as long as a divide operation...
There are some interesting ways to figure out if a number divides by 3, 5 and so on by summing the digits, but unfortunately, those only work based on decimal digits, which involves a long sequence of divides.