#
* *
How to determine path from noisy X, Y data

###### https://stackoverflow.com/questions/240473

### Question

I have an unsorted list of noisy X, Y points. They do, however, form a path through the world. I would like an algorithm to draw an approximation of this data using line segments.

This is similar to how you would use a line -fitting algorithm to pick an approximation of linear data. My problem is only harder because the path bends and winds around the world. alt text http://www.praeclarum.org/so/pathfinder.png

Does anyone know of any standard / robust / easy to comprehend algorithms to accomplish this?

**Q&A**:

**What do you mean by noisy?** If I had an ideal realization of the path, then my set of points would be sampled from that ideal path with gaussian noise added to the X and Y elements. I do not know the mean or standard deviation of that noise. I may be able to guess at the std dev...

**Do the points lie near, but not on, some ideal but complicated path which you seek to approximate?** Yes.

**Do you have any a priori information about he shape of the path? Any other way to get such information?** Unfortunately not.

### Solution

With an unsorted list, you won't really know which points to include in each segment, so I guess you could just go with the closest point.

One way could be to pick a start point at random, and pick the closest point as the next point in each step. Add the first two points to a set S.

Fit a line to the points in S until the RMS exceeds some value, then clear S and start a new line.

The intersection of consecutive lines would be the end-points of the segments.

### OTHER TIPS

Bezier Interpolation may fit your problem.

This does not address the ordering of the points into a path, however; there are a number of approaches to consider:

- Any "optimal" type of path (e.g. smallest direction change at each point on the path, * Shortest path through all points) will likely boil down the NP complete Travelling Salesman Problem (TSP).
- A "reasonable" path to cluster the nodes and then route between clusters, and within clusters. Of course, the larger the cluster, or the larger the number of clusters the more this smaller problem looks like a large n TSP.
- Ordering the points by one axis. If there are much more than 2 axes, some dimensional reduction strategy may be useful. e.g. Independent Component Analysis.

If your points are close to each other, you can normal "straight" lines (orthogonal lines). Using the normal smoothing algorithms. You can see the world as being flat.

If they are far apart, you need to compensate for the rounding of the earth, by using great circles to navigate from point to point. Otherwise your straight lines will make a longer way.

It is your choice if a point is too far to create straight lines.

Further you have to know if you need to "visit" each point, or just need to go near, and how near that near is.

If you need to send the course(s) to a plane, ship or other traveller, you probably need to visit each point. If you get the GPS data from an object, you probably just want to plot a course on a screen, and remove the noise.

**After seeing your edits:**
If this is an object moving some traject you want to plot, you might want to smooth the direction and speed instead of the x/y values. (Making your measured values (x) have a fixed and increasing Y-interval makes smoothing a lot easier.)

Here is a heuristic hack that might address the ordering problem for the data, if

- you have enough points
- the mean distance between points is small compared to the smallest radius of curvature expected of the path
- the mean distance between points is not large compared to the std. dev. of the noise
- the path is not self-crossing (you might get lucky, but no guarantees)

Proceed like this:

- Pick (hopefully by a meaningful rather than random means) a starting point,
**p1**. - Find all the points that lie within some clustering distance,
**r_c**of**p1**. Choose**r_c**small compared to the expected turning radius, but large compared to the scatter. - Call this cluster
**C1**. - Find point
**q1**the mean of positions in**C1**. - Fit a line to the points in
**C1**and project to (or just beyond) the edge of the cluster, and find the nearest point in your original data. Label that point**p2**. - Iterate steps 2-5 until you run out of data.

Now you have a new list of points **q1**..**qn** that are ordered.

Off the top of my head, very rough, and only works under pretty good conditions...

Self-crossing behavior can probably be improved by requiring in step (5) that the new projected line lie within some maximum angle of the previous one.

The problem with the Bezier curve is that is doesn't actually go though the points you have sampled and even though the points samples are distorted a little; the bezier curve might actually be miles off.

A better approximation, and a solution that seems to resemble the original image way better is a Catmull-Rom Spline because it does run though all the points in the curve.

My approach would be to first sort your list of points, then use a bezier curve.

The trick is of course the sorting. Start with one random point and find the nearest point. Assume these two are connected. With those two endpoints, find the nearest points to them. Assume that the one with the smaller distance to it's endpoint is connected to that point. Repeat until all points are connected.

I assume that there are still some problems with this approach, but maybe you can use it as a starting point (pun intended).

**Edit:** You can do it several times with different starting points, and then see where the results differ. That at least gives you some confidence, which points are connected to each other.

A completely different approach, that does not require another constraint, but details may depend on your application. It sghould work best if you have a "dense cloud of points" around the path.

Use a "cost" function that defines the difference between the curve and the cloud of points. Use a parametrized curve, and a standard optimization algorithm. - OR - Start with a straight curve from start to end, then use a genetic algorithm to modify it.

The typical cost function would be to take the smallest distance between each point and the curve, and sum the squares.

I have not enough experience to suggest an optimization or genetic algorithm but I am sure it can be done :)

I could imagine a genetic algorithm as follows: The path will be built from Waypoints. Start with putting N waypoints in a straigt line from start to end. (N can be chosen depending on the problem). Mutations could be:

- For each segment, if rnd() < x, a new waypoint is introduced in the middle.
- For each waypoint, the X and Y coordinate are varied slightly.

You will need to include the total length in the cost function. Splitting might not be needed, or maybe x (the "split chance") might need to decrease as more waypoints are introduced. You may or may not want to apply (2) to the start- and endpoint.

Would be fun to try that...

I take it that "unsorted list" means that while your set of points is complete, you don't know what order they were travelled through?

The gaussian noise has to be basically ignored. We're given absolutely no information that allows us to make any attempt to reconstruct the original, un-noisy path. So I think the best we can do is assume the points are correct.

At this point, the task consists of "find the best path through a set of points", with "best" left vague. I whipped up some code that attempts to order a set of points in euclidean space, preferring orderings that result in straighter lines. While the metric was easy to implement, I couldn't think of a good way to improve the ordering based on that, so I just randomly swap points looking for a better arrangement.

So, here is some PLT Scheme code that does that.

```
#lang scheme
(require (only-in srfi/1 iota))
; a bunch of trig
(define (deg->rad d)
(* pi (/ d 180)))
(define (rad->deg r)
(* 180 (/ r pi)))
(define (euclidean-length v)
(sqrt (apply + (map (lambda (x) (expt x 2)) v))))
(define (dot a b)
(apply + (map * a b)))
(define (angle-ratio a b)
(/ (dot a b)
(* (euclidean-length a) (euclidean-length b))))
; given a list of 3 points, calculate the likelihood of the
; angle they represent. straight is better.
(define (probability-triple a b c)
(let ([av (map - a b)]
[bv (map - c b)])
(cos (/ (- pi (abs (acos (angle-ratio av bv)))) 2))))
; makes a random 2d point. uncomment the bit for a 3d point
(define (random-point . x)
(list (/ (random 1000) 100)
(/ (random 1000) 100)
#;(/ (random 1000) 100)))
; calculate the likelihood of an entire list of points
(define (point-order-likelihood lst)
(if (null? (cdddr lst))
1
(* (probability-triple (car lst)
(cadr lst)
(caddr lst))
(point-order-likelihood (cdr lst)))))
; just print a list of points
(define (print-points lst)
(for ([p (in-list lst)])
(printf "~a~n"
(string-join (map number->string
(map exact->inexact p))
" "))))
; attempts to improve upon a list
(define (find-better-arrangement start
; by default, try only 10 times to find something better
[tries 10]
; if we find an arrangement that is as good as one where
; every segment bends by 22.5 degrees (which would be
; reasonably gentle) then call it good enough. higher
; cut offs are more demanding.
[cut-off (expt (cos (/ pi 8))
(- (length start) 2))])
(let ([vec (list->vector start)]
; evaluate what we've started with
[eval (point-order-likelihood start)])
(let/ec done
; if the current list exceeds the cut off, we're done
(when (> eval cut-off)
(done start))
; otherwise, try no more than 'tries' times...
(for ([x (in-range tries)])
; pick two random points in the list
(let ([ai (random (vector-length vec))]
[bi (random (vector-length vec))])
; if they're the same...
(when (= ai bi)
; increment the second by 1, wrapping around the list if necessary
(set! bi (modulo (add1 bi) (vector-length vec))))
; take the values from the two positions...
(let ([a (vector-ref vec ai)]
[b (vector-ref vec bi)])
; swap them
(vector-set! vec bi a)
(vector-set! vec ai b)
; make a list out of the vector
(let ([new (vector->list vec)])
; if it evaluates to better
(when (> (point-order-likelihood new) eval)
; start over with it
(done (find-better-arrangement new tries cut-off)))))))
; we fell out the bottom of the search. just give back what we started with
start)))
; evaluate, display, and improve a list of points, five times
(define points (map random-point (iota 10)))
(define tmp points)
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 10))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 100))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 1000))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 10000))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
```

It seems that you know the 'golden curve' from your answers to questions, I would suggest finding the Bezier curve of the 'golden curve' as suggested by @jamesh and drawing that.

How many points you have?

A Bezier curve, as mentioned, is a good idea if you have comparedly few points. If you have many points, buiding clusters as suggested by dmckee.

However you **also** need another constraint for defining the order of points. There have been many good suggestions for how to chose the points, but unless you introduce another constraint, any gives a possible solution.

Possible constraints I can think of:

- shortest path
- most straight segments
- least total absolute rotation
- directional preference (i.e. horizontal / vertical is more likely than crisscrossing)

In all cases, to meet the constraint you probably need to test all permutations of the sequence. If you start with a "good guess", you cayn terminate the others quickly.