You are not fully applying _bucketsort
in your second benchmark, and are therefore just evaluating a partially applied function to WHNF, which unsurprisingly is quite fast.
Changing the relevant lines to
main = defaultMain [ bench "bucketsort 96080" $ whnf bucketsort ((dataset 96080) :: [Int])
, bench "_bucketsort 96080" $ whnf (flip _bucketsort Map.empty) ((dataset 96080):: [Int])]
yields (on my machine):
warming up
estimating clock resolution...
mean is 2.357120 us (320001 iterations)
found 2630 outliers among 319999 samples (0.8%)
2427 (0.8%) high severe
estimating cost of a clock call...
mean is 666.7750 ns (14 iterations)
found 1 outliers among 14 samples (7.1%)
1 (7.1%) high severe
benchmarking bucketsort 96080
collecting 100 samples, 1 iterations each, in estimated 34.66980 s
mean: 244.3280 ms, lb 238.0601 ms, ub 250.6725 ms, ci 0.950
std dev: 32.37658 ms, lb 28.02356 ms, ub 38.10187 ms, ci 0.950
found 3 outliers among 100 samples (3.0%)
3 (3.0%) low mild
variance introduced by outliers: 87.311%
variance is severely inflated by outliers
benchmarking _bucketsort 96080
collecting 100 samples, 1 iterations each, in estimated 24.65911 s
mean: 244.9425 ms, lb 239.1011 ms, ub 251.0300 ms, ci 0.950
std dev: 30.68877 ms, lb 26.48151 ms, ub 36.20961 ms, ci 0.950
variance introduced by outliers: 86.247%
variance is severely inflated by outliers
Note furthermore that this benchmark isn't fully forcing the list, because whnf
on a list will only evaluate the top-level constructor. This explains why both benchmarks have nearly the same performance now. Switching both benchmarks to nf
changes the times to 369.3022ms and 354.3513ms, respectively, making bucketsort
somewhat slower again.