SIMD Implementation of std::nth_element

https://stackoverflow.com/questions/20147665

03-08-2022
|

Question

I have an algorithm that runs on my dual-core, 3 GHz Intel processor in on average 250ms, and I am trying to optimize it. Currently, I have an std::nth_element call that is invoked around 6,000 times on std::vectors of between 150 and 300 elements, taking on average 50ms. I've spent some time optimizing the comparator I use, which currently looks up two doubles from a vector and does a simple < comparison. The comparator takes a negligible fraction of the time to run std::nth_element. The comparator's copy-constructor is also simple.

Since this call is currently taking 20% of the time for my algorithm, and since the time is mostly spent in the code for nth_element that I did not write (i.e. not the comparator), I'm wondering if anyone knows of a way of optimizing nth_element using SIMD or any other approach? I've seen some questions on parallelizing std::nth_element using OpenCL and multiple threads, but since the vectors are pretty short, I'm not sure how much benefit I would get from that approach, though I'm open to being told I'm wrong.

If there is an SSE approach, I can use any SSE instruction up to (the current, I think) SSE4.2.

Thanks!

La solution

Two thoughts:

Multithreading probably won't speed up processing for any single vector, but might help you as the number of vectors grows large.
Sorting is too powerful a tool for your problem: you're computing the entire order of the vector, but you don't care about anything but the top few. You know for each vector how many elements make up the top 5%, so instead of sorting the whole thing you should make one pass through the array and find the k largest. You can do this is O(n) time with k extra space, so it's probably a win overall.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow