Haskell to find the distance between the two closest points

Question 1

You are using a quadratic algorithm:

project []  = error "Empty list of points"
project [_] = error "Single point is given"
project ps  = go 10000 ps
  where
    go a [_]    = a
    go a (p:ps) = let a2 = min a $ minimum [distance p q | q<-ps]
                  in a2 `seq` go a2 ps

You should use a better algorithm. Search computational-geometry tag on SO for a better algorithm.

See also http://en.wikipedia.org/wiki/Closest_pair_of_points_problem .

@maxtaldykin proposes a nice, simple and effective change to the algorithm, which should make a real difference for random data -- pre-sort the points by X coordinate, and never try points more than d units away from the current point, in X coordinate (where d is the currently known minimal distance):

import Data.Ord (comparing)
import Data.List (sortBy)

project2 ps@(_:_:_) = go 10000 p1 t 
  where
    (p1:t) = sortBy (comparing fst) ps
    go d _         [] = d
    go d p1@(x1,_) t  = g2 d t
      where
        g2 d []          = go d (head t) (tail t)
        g2 d (p2@(x2,_):r)
           | x2-x1 >= d  = go d (head t) (tail t)
           | d2 >= d     = g2 d  r
           | otherwise   = g2 d2 r   -- change it "mid-flight"
               where
                 d2 = distance p1 p2

On random data, g2 will work in O(1) time so that go will take O(n) and the whole thing will be bounded by a sort, ~ n log n.

Empirical orders of growth show ~ n^2.1 for the first code (on 1k/2k range) and ~n^1.1 for the second, on 10k/20k range, testing it quick'n'dirty compiled-loaded into GHCi (with second code running 50 times faster than first for 2,000 points, and 80 times faster for 3,000 points).

Question 2

It's possible to slightly modify your bruteforce search to get better performance on random data.

Main idea is to sort points by x coordinate and, while comparing distances in loop, consider only points that have horizontal distance not grater than current minimum distance.

This could be order of magnitude faster but in the worst case it is still O(n^2).
Actually, on 2000 points it is 50 times faster on my machine.

project points = loop1 10000 byX
  where
    -- sort points by x coordinate
    --  (you need import Data.Ord to use `comparing`)
    byX = sortBy (comparing fst) points

    -- loop through all points from left to right
    -- threading `d` through iterations as a minimum distance so far
    loop1 d = foldl' loop2 d . tails

    -- `tail` drops leftmost points one by one so `x` is moving from left to right
    -- and `xs` contains all points to the right of `x`
    loop2 d [] = d
    loop2 d (x:xs) = let
        -- we take only those points of `xs` whose horizontal distance
        -- is not greater than current minimum distance
        xs' = takeWhile ((<=d) . distanceX x) xs
        distanceX (a,_) (b,_) = b - a

        -- then just get minimum distance from `x` to those `xs'`
      in minimum $ d : map (distance x) xs'

Btw, please don't use so many parentheses. Haskell does not require to enclose function arguments.