I have commuter flow data from the 2001 Census. I converted it from "flat" to "long" form using melt
from the reshape2
R package and placed origins and destinations in the same row because ggplot2 accepts only data-frame inputs.
My issue is that I've ended-up doubling-up the data for each line, so that each row (line) has an origin and a destination. I'm sure there is a more concise solution, probably involving an even longer form of my data.
To make the problem specific, I've produced a small worked example from Hereford:
# prepare data + packages
library(ggplot2)
library(ggmap)
flows.mini <- flows.ft[1:100,]
save(flows.mini, file="flows.mini.RData")
load("flows.mini.RData")
head(flows.mini)
variable X.1 value X1.x X2.x X1.y X2.y n nr
1 00GANY 00GANY 605 -2.699389 52.06554 -2.699389 52.06554 605 1
2 00GANY 00GAPA 135 -2.742064 52.04099 -2.699389 52.06554 135 2
3 00GANY 00GAQD 25 -2.733890 51.93402 -2.699389 52.06554 25 3
fcols
1 500+
2 100-500
3 10-100
To reproduce the steps taken from the last two lines, please download the RData file (2 kb): http://dl.dropbox.com/u/15008199/flows.mini.RData and reproduce the plot:
This is how I've plotted it:
# plot flows by doubling-up
hford <- qmap("hereford", source = "stamen", maptype = "toner", extent = "normal", maprange=FALSE)
hford + geom_path(data= flows.mini, aes(x=c(X1.x,X1.y), y=c(X2.x, X2.y),
group = c(nr, nr), color = c(fcols,fcols), size= c(n,n)),
lineend = "round") +
scale_size_continuous(range = c(0.05,5)) +
scale_color_discrete(breaks = c("0-10", "10-100", "100-500", "500+")) +
coord_map()
I think you'll agree the double attributes are inefficient, so, to re-phrase my question: how can I remove them?