Handling many points at one position in R

https://stackoverflow.com/questions/22113853

18-10-2022
|

Frage

I have a question regarding data handling in R. I have two datasets. Both are originally .csv files. I've prepared two example Datasets:

Table A - Persons
http://pastebin.com/HbaeqACi

Table B - City
http://pastebin.com/Fyj66ahq

To make it as less work as possible the corresponding R Code for loading and visualizing.

# Read csv files
# check pastebin links and save content to persons.csv and city.csv.
persons_dataframe = read.csv("persons.csv", header = TRUE)
city_dataframe = read.csv("city.csv", header = TRUE)
# plot them on a map
# load used packages
library(RgoogleMaps)
library(ggplot2)
library(ggmap)
library(sp)

persons_ggplot2 <- persons_dataframe
city_ggplot2 <- city_dataframe
gc <- geocode('new york, usa')
center <- as.numeric(gc)  
G <- ggmap(get_googlemap(center = center, color = 'color', scale = 4, zoom = 10, maptype = "terrain", frame=T), extent="panel")
G1 <- G + geom_point(aes(x=POINT_X, y=POINT_Y ),data=city_dataframe, shape = 22, color="black", fill = "yellow", size = 4) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=persons_dataframe, shape = 8, color="red", size=2.5)
plot(G1)

As a result I have a map, which visulaizes all cities and persons.
My problem: All persons are distributed only on these three cities.

My questions:

A more general questions: Is this a problem for R?
I want to create something like a bubble map, which visualized the amount of persons at one position. Like: In City A there are 20 persons, in City B are 5 persons. The position at city A should get a bigger bubble than City B.
I want to create a label, which states the amount of persons at a certain position. I've already tried to realize this with the ggplo2 geom_text options, but I can't figure out how to sum up all points at a certain position and write this to a label.
A more theoretical approach (maybe I come back to this later on): I want to create something like a density map / cluster map, which shows the area, with the highest amount of persons. I've already search for some packages, which I could use. Suggested ones were SpatialEpi, spatstat and DCluster. My question: Do I need the distance from the persons to a certain object (let's say supermarket) to perform a cluster analyses?

Hopefully, these were not too many questions.
Any help is much appreciated. Thanks in advance!

Btw: Is there any better help to prepare a question containing example datasets? Should I upload a file somewhere or is the pastebin way okay?

Lösung

You can create the bubble chart by counting the number in each city and mapping the size of the points to the counts:

library(plyr)
persons_count <- count(persons_dataframe, vars = c("city", "POINT_X", "POINT_Y"))

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red")

You can map the counts to the area of the points, which perhaps gives a better sense of the relative sizes:

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red") +
    scale_size_area(breaks = unique(persons_count$freq))

You can add the frequency labels, though this is somewhat redundant with the size scale legend:

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red") +
    geom_text(aes(x = POINT_X, y=POINT_Y, label = freq), data=persons_count) +
    scale_size_area(breaks = unique(persons_count$freq))

You can't really plot densities with your example data because you only have three points. But if you had more fine-grained location information you could calculate and plot the densities using the stat_density2d function in ggplot2.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow