Domanda

I have a question regarding data handling in R. I have two datasets. Both are originally .csv files. I've prepared two example Datasets:

Table A - Persons
http://pastebin.com/HbaeqACi

Table B - City
http://pastebin.com/Fyj66ahq

To make it as less work as possible the corresponding R Code for loading and visualizing.

# Read csv files
# check pastebin links and save content to persons.csv and city.csv.
persons_dataframe = read.csv("persons.csv", header = TRUE)
city_dataframe = read.csv("city.csv", header = TRUE)
# plot them on a map
# load used packages
library(RgoogleMaps)
library(ggplot2)
library(ggmap)
library(sp)

persons_ggplot2 <- persons_dataframe
city_ggplot2 <- city_dataframe
gc <- geocode('new york, usa')
center <- as.numeric(gc)  
G <- ggmap(get_googlemap(center = center, color = 'color', scale = 4, zoom = 10, maptype = "terrain", frame=T), extent="panel")
G1 <- G + geom_point(aes(x=POINT_X, y=POINT_Y ),data=city_dataframe, shape = 22, color="black", fill = "yellow", size = 4) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=persons_dataframe, shape = 8, color="red", size=2.5)
plot(G1)

As a result I have a map, which visulaizes all cities and persons.
My problem: All persons are distributed only on these three cities.

My questions:

  1. A more general questions: Is this a problem for R?
  2. I want to create something like a bubble map, which visualized the amount of persons at one position. Like: In City A there are 20 persons, in City B are 5 persons. The position at city A should get a bigger bubble than City B.
  3. I want to create a label, which states the amount of persons at a certain position. I've already tried to realize this with the ggplo2 geom_text options, but I can't figure out how to sum up all points at a certain position and write this to a label.
  4. A more theoretical approach (maybe I come back to this later on): I want to create something like a density map / cluster map, which shows the area, with the highest amount of persons. I've already search for some packages, which I could use. Suggested ones were SpatialEpi, spatstat and DCluster. My question: Do I need the distance from the persons to a certain object (let's say supermarket) to perform a cluster analyses?

Hopefully, these were not too many questions.
Any help is much appreciated. Thanks in advance!

Btw: Is there any better help to prepare a question containing example datasets? Should I upload a file somewhere or is the pastebin way okay?

È stato utile?

Soluzione

You can create the bubble chart by counting the number in each city and mapping the size of the points to the counts:

library(plyr)
persons_count <- count(persons_dataframe, vars = c("city", "POINT_X", "POINT_Y"))

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red")

You can map the counts to the area of the points, which perhaps gives a better sense of the relative sizes:

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red") +
    scale_size_area(breaks = unique(persons_count$freq))

You can add the frequency labels, though this is somewhat redundant with the size scale legend:

G + geom_point(aes(x=POINT_X, y=POINT_Y, size=freq),data=persons_count, color="red") +
    geom_text(aes(x = POINT_X, y=POINT_Y, label = freq), data=persons_count) +
    scale_size_area(breaks = unique(persons_count$freq))

You can't really plot densities with your example data because you only have three points. But if you had more fine-grained location information you could calculate and plot the densities using the stat_density2d function in ggplot2.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top