You can try getting a little better performance by not "paying" for what your program does not use: rather than using a tree structure, which keeps all elements sorted, you can use a heap which does not sort elements, but lets you extract k
largest elements in k * log(n)
time.
Heaps tend to be faster than balanced trees on insertions and deletions, and they also occupy contiguous regions of memory, which may improve "cache friendliness" in some cases. Of course the speed-up is not free: heaps do not support fast sorting or removal of items other than the ones at the top.