Finding largest subset of points forming a convex polygon

Question 1

This problem has been already published in these competitions:

SPOJ problem BOYSCOUT
USACO DEC08 problem "Largest Fence" (last problem on the page)

And its solution (O(N³) algorithm) could be found on this page: "USACO DEC08 Problem 'fence' Analysis" by Bruce Merry and Jacob Steinhardt.

The following is an attempt to explain this algorithm. Also here is my implementation of this algorithm in C++11. This code works for N<=250 and integer coordinates in range 0 .. 1023. No three points should be on the same line. Here is Python script that generates test data satisfying these requirements.

O(N²) algorithm for simplified problem

Simplified problem: find largest subset of points that form a convex polygon and contain the leftmost point of given set (or if there are several leftmost points, this polygon should contain topmost of them).

To solve this problem we could connect each pair of points by directed line segment, then (implicitly) treat each segment as a graph node as shown in diagram:

representing point set as a graph

Here parent node and corresponding segment have blue color. Line segment corresponding to its left child (red color) starts at the end of "parent" segment and it is the line segment making least possible left turn relative to the direction of "parent" segment. Line segment corresponding to its right child (green color) starts at the start of "parent" segment and it is also the line segment making least possible left turn relative to the direction of "parent" segment.

Any segment ending at leftmost point corresponds to a "leaf" node, i.e. it has no child nodes. Such construction of the graph guarantees that there are no cycles, in other words this graph is a DAG.

Now to find largest convex subset of points it is enough to find the longest path in this graph. And the longest path in DAG could be found with dynamic programming algorithm in time O(E), where E is number of edges in the graph. Since each node has only 2 outgoing edges, each corresponding to a pair of points, this problem could be solved in O(N²) time.

All this is possible to implement if we preprocess input data, sorting directed line segments starting at the same point by angle. This allows to perform depth-first traversal in the graph. We should memorize the longest path starting from each processed node, so that each graph edge is visited at most once, and we have O(E) = O(N²) time algorithm (ignoring preprocessing time for now). Space requirements are also O(N²) because we need to store sorted directions for each pair of points and memorization uses one value per node (which is also a pair of points).

Here is C++ implementation of this dynamic programming algorithm:

unsigned dpStep(ind_t i_left, ind_t from, ind_t to_ind)
{
    ind_t child = sorted[from][to_ind];

    while (child < i_left)
        child = sorted[from][++to_ind];

    if (child == i_left)
        return 1;

    if (auto v = memorize[from][child])
        return v;

    const auto x = max(dpStep(i_left, child, lefts[from][child]) + 1,
                       dpStep(i_left, from, static_cast<ind_t>(to_ind + 1)));

    memorize[from][child] = static_cast<ind_t>(x);
    return x;
}

Input parameters are the leftmost point and a pair of points forming current line segment (where starting point of the segment is given directly but ending point is given as an index in sorted by angle array of points). The word "left" in this code fragment is slightly overused: it means both the leftmost point (i_left) and preprocessed array containing left childs for the graph (lefts[][]).

O(N³) algorithm

General problem (where the leftmost point is not fixed) could be solved this way:

Sort the points in left-right direction
Preprocess the points to get two arrays: (a) each point sorted by direction relative to each other point and (b) position in first array of the end of line segment making least possible left turn relative to the direction of "parent" segment.
Use each point as the leftmost point and find the longest convex polygon with "simplified" algorithm.
Periodically prune all points to the left of the "leftmost" point from preprocessed arrays.
Take the longest of paths found on step 3.

Step 4 is optional. It does not improve O(N³) time complexity. But it slightly improves speed of DP algorithm by excluding unneeded points. In my tests this gives 33% speed boost.

There are several alternatives for preprocessing. We could compute every angle (with atan2) and sort them, as it is done in sample code by Bruce Merry and Jacob Steinhardt. This is a simplest way but atan2 may be too expensive, especially for smaller point sets.

Or we could exclude atan2 and sort tangents instead of angles, as it is done in my implementation. This is a little bit more complicated but faster.

Both these alternatives need O(N² log N) time (unless we use some distribution sorting). Third alternative is to use well known computational geometry methods (arrangements and duality) to get O(N²) preprocessing. But I don't think it is useful for our problem: it is probably too slow for smaller point sets because of large constant factor, as for larger point sets, it might give some speed improvement, but too insignificant because step 3 will greatly outweigh step 2. Also it is much more difficult to implement.

Some other results: I tried to implement iterative DP instead of recursion; this did not noticeably change the speed. Also I tried to perform two DP searches at once, interleaving steps of each search (something similar to fibers or HyperThreading implemented in software); this might help because DP consists mostly of memory accesses having high latency and preventing full utilization of CPU throughput; the results are not very impressive, only about 11% speed boost, most likely with real HyperThreading it would be much faster.

Question 2

I think I came up with an algorithm for it by extending Andrew's algorithm for convex hulls.

Originally, I came up with a O(N^4) algorithm, but noticed was way over-complicating it and brought it down to O(N^2) algorithm. It seems like there maybe a bug in the code somewhere that causes issues in at least the case of a vertical line of points. It might be a special case (requiring changing a few lines of code), or a sign of a larger flaw in the algorithm. If it's the latter, then I'm inclined to say it's NP-hard, and offer the algorithm as a heuristic. I've spent all the time I care to coding and debugging it. In any case it seems to work fine in other cases. However, it's difficult to test for correctness when the correct algorithms seem to be O(2^N).

def maximal_convex_hull(points):
    # Expects a list of 2D points in the format (x,y)


    points = sorted(set(points)) # Remove duplicates and sort
    if len(points) <= 1: # End early
        return points

    def cross(o, a, b): # Cross product
        return (a[0] - o[0]) * (b[1] - o[1]) - (a[1] - o[1]) * (b[0] - o[0])


    # Use a queue to extend Andrew's algorithm in the following ways:
    # 1. Start a new convex hull for each successive point
    # 2. Keep track of the largest convex hull along the way
    Q = []
    size = 0
    maximal_hull = None
    for i,p in enumerate(points):
        remaining = len(points) - i + 1
        new_Q = []
        for lower, upper in Q: # Go through the queue
            # Build upper and lower hull similtanously, slightly less efficent
            while len(lower) >= 2 and cross(lower[-2], lower[-1], p) < 0:
                lower.pop()
            lower.append(p)
            while len(upper) >= 2 and cross(upper[-2], upper[-1], p) > 0:
                upper.pop()
            upper.append(p)

            new_size = len(lower) + len(upper)
            if new_size > size: # Check for a new highest maximal convex hull
                size = new_size
                maximal_hull = (lower, upper)


        # There is an odd bug? that when one or both if statements are removed
        #  xqg237.tsp (TSPLIB) gives slightly different solutions and is
        #   missing a number of points it should have.
        #  Seems like if the opposite should be true if anything since they
        #   were intended to be easy optimizations not required code.
            if remaining + new_size >= size: # Don't bother if not enough
                new_Q.append((lower, upper)) # Add an updated convex hulls
        if remaining > size: # Don't bother if not enough
            new_Q.append(([p], [p])) # Add a new convex hull
        Q = new_Q # Update to the new queue

    # Extract and merge the lower and upper to a maximium convex hull
    # Only one of the last points is ommited because it is repeated
    #    Asserts and related variables used for testing
    #    Could just have "return lower[:-1] + list(reversed(upper[:-1]))[:-1]"
    lower, upper = maximal_hull
    max_hull_set = set(lower) | set(lower) | set(upper)
    lower = lower
    upper = list(reversed(upper[:-1]))[:-1]
    max_convex_hull = lower + upper
    assert len(lower) + len(upper) == len(max_hull_set)
    assert max_hull_set == set(max_convex_hull)
    return max_convex_hull

Edit: This algorithm isn't correct, but it served as inspiration for the correct algorithm and thus I'm leaving it here.

Question 3

This is my a Dynamic Programming O(N^4) algorithm with idea from Andrew's Algorithm posted by Nuclearman, i'm still looking for a O(N^3) algorithm

upper_hull(most left point, previous point, current point)
{
    ret = 0
    if (current point != most left point)   /* at anytime we can decide to end the upper hull and build lower_hull from current point ending at most left point */
        ret = min(ret,lower_hull(most left point, -1, current point)) 
    for all point on the right of current point /* here we try all possible next point that still satisfy the condition of convex polygon */
        if (cross(previous point,current point,next point) >= 0) max(ret,1+upper_hull(most left point, current point, next point))
    return ret;
}

lower_hull(most left point, previous point, current point)
{
    if (current point == most left point) return 0;
    ret = -INF /* it might be impossible to build a convex hull here, so if it does it will return -infinity */
    for all point on the left of current point and not on the left of most left point
        if (cross(previous point,current point,next point) >= 0) max(ret,1+lower_hull(most left point, current point, next point))
    return ret;
}

First sort the point based on x axis then for tie sort by y axis, then try all point as most left point to run the upper_hull(p,-1,p) , please tell me if there's any flaw in this algorithm

Question 4

You can use a delaunay triangulation and remove the longest edge and also the vertex. I use a similar algorithm to find the concave hull. You can find jan example based on population data @ http://www.phpdevpad.de/geofence. I have also wrote a php class concave-hull @ phpclasses.org.

Finding largest subset of points forming a convex polygon

O(N2) algorithm for simplified problem

O(N3) algorithm

O(N²) algorithm for simplified problem

O(N³) algorithm