Question

I have a large dataset with two columns: timestamp and lat/lon. I want to group the coordinates in someway to determine the number of different places that are recorded, treating everything within a certain distance of each other as all one location. Essentially I want to figure out how many different "places" are in this dataset. A good visual example is this I'd like to wind up here, but I do not know where the clusters are with my dataset.

Was it helpful?

Solution

Detailing more on behzad.nouri's reference

# X= your Geo Array

# Standardize features by removing the mean and scaling to unit variance
X = StandardScaler().fit_transform(X)

# Compute DBSCAN
db = DBSCAN(eps=0.3, min_samples=3).fit(X)

# HERE
# eps -- The maximum distance between two samples 
#  for them to be considered as in the same neighborhood.
# min_samples -- The number of samples in a neighborhood for a point 
#  to be considered as a core point.

core_samples = db.core_sample_indices_
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

OTHER TIPS

This psuedo code demonstrates how to reduce a set of points to a single point per grid partition while tallying the number of points in the grid partition. This can be useful if you have a set of points where some areas are sparse and others are dense, but want an even distribution of displayed points (such as on a map).

To use the function, one passes the set of points and the number of partitions across one of the axis (e.g., X). The same partitioning will be used on the other axis (e.g., Y). So if one specified 3, then 9 (3*3) equal sized partitions would be made. The function first goes through the set of points to find the outermost X and Y (min and max) coordinates that bound the entire set. The distance between the outermost X and Y axis is then divided by the number of partitions to determine the grid size.

The function then steps through each grid partition and checks each point in the set whether it is within the grid partition. If the point is within the grid partition, it checks if this is the first point encountered in the grid partition. If yes, a flag is set to indicate that the first point has been found. Otherwise, not the first point in the grid partition, the point is removed from the set of points.

For each point that is found in the partition, the function increments a tally count. Finally, when the reduction/tallying is completed per grid partition, one can then visualize the tallied point (e.g., show marker on map at the single point with a tally indicator):

function TallyPoints( array points, int npartitions )
{
    array partition = new Array();

    int max_x = 0, max_y = 0;
    int min_x = MAX_INT, min_y = MAX_INT

    // Find the bounding box of the points
    foreach point in points
    {
        if ( point.X > max_x )
            max_x = point.X;
        if ( point.Y < min_x )
            min_x = point.X;
        if ( point.Y > max_y )
            max_y = point.Y;
        if ( point.Y < min_y )
            min_y = point.Y;
    }

    // Get the X and Y axis lengths of the paritions
    float partition_length_x =  ( ( float ) ( max_x - min_x ) ) / npartitions;
    float partition_length_y =  ( ( float ) ( max_y - min_y ) ) / npartitions;

    // Reduce the points to one point in each grid partition
    // grid partition
    for ( int n = 0; n < npartitions; n++ )
    {
        // Get the boundary of this grid paritition
        int min_X = min_x + ( n * partition_length_x );
        int min_Y = min_y + ( n * partition_length_y );
        int max_X = min_x + ( ( n + 1 ) * partition_length_x );
        int max_Y = min_y + ( ( n + 1 ) * partition_length_y );

        // reduce and tally points
        int     tally  = 0;
        boolean reduce = false; // set to true after finding the first point in the paritition
        foreach point in points
        {
            // the point is in the grid parition
            if ( point.X >= min_x && point.X < max_x &&
                 point.Y >= min_y && point.X < max_y )
            {
                // first point found
                if ( false == reduce )
                {
                    reduce = true;
                    partition[ n ].point = point;   // keep this as the single point for the grid
                }
                else
                    points.Remove( point ); // remove the point from the list

                // increment the tally count
                tally++;
            }
        }

        // store the tally for the grid
        partition[ n ].tally = tally;

        // visualize the tallied point here (e.g., marker on Google Map)
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top