I am trying to implement a kdtree in Haskell (see implementation) but I tried to be smart and utilize Haskells lazyness while implementing the nearest neighbour algorithm (see line 46).
While it is technically correct, that is:
minimumBy (compare `on` qd q) vs == head . nearestNeighbours (kdtree 5 vs) $ q
==> True
The kdtree based version is much slower (criterion: 5.38ms vs 138.44ms, with 500k datapoints). I first thought that this was due to too strict pattern matching in ordMerge
(line 59), but I rewrote it and from my understanding bs
should now only be evaluated as needed.
If this is the case the algorithm should only descend to the matching bucket, and reascend while checking that the current best nearest neighbor is really the best candidate.
I did some profiling and nearestNeighhbors
is called about 800 times. Given a tree depth of 8 and 100 testcases this sounds about reasonable, doesn't it?
Just uploaded my code to github: https://github.com/fhaust/threesg
This should get you started:
git clone https://github.com/fhaust/threesg
cd threesg
cabal sandbox init
cabal install --enable-benchmarks --enable-tests
cabal test
cabal bench --benchmark-options="+RTS -K100M -RTS"
(-K100M
is needed cause the testset is created from 500k points)
While creating a testset for github I noticed, that on normal distributed points, the kdtree search runs alot faster than the linear search ... probably my problem is not the algorithm ... but my testset :(