Question

I've been struggling with an attempt to join a data frame with a shapefile and plotting the results. I'm attempting to follow the method used proposed in @jlhoward's answer to this question.

I have a national dataset of vaccination rates by post code. I'm trying to merge it with a ESRI shapefile of post codes from the Australian Bureau of Statistics and plot the results by postcode as per the other question.

This is where my current attempt sits:

library(rgdal)
library(maptools) 
library(ggplot2)
library(plyr)
setwd("~/Google Drive/R/PC_Shapes")
vac.data <- read.csv(file = "Postcode2013.csv", header=TRUE, sep=" ", na.string="NA", dec=".", strip.white=TRUE)
postcode <- readOGR("POA06aAUST_region.shp", layer="POA06aAUST_region")
postcode@data$id <- rownames(postcode@data)
postcode.df <- fortify(postcode)
postcode.df <- join(postcode.df, postcode@data, by="id")
postcode.df <- merge(postcode.df, vac.data, all=TRUE)
ggp <- ggplot(data=postcode.df, aes(x=long, y=lat, group=group)) 
ggp <- ggp + geom_polygon(aes(fill=LEVEL))         
ggp <- ggp + geom_path(color="grey", linestyle=2) 
ggp <- ggp + coord_equal() 
ggp <- ggp + scale_fill_gradient(low = "#ffffcc", high = "#ff4444", space = "Lab", na.value = "grey50", guide = "colourbar")
ggp <- ggp + labs(title="Vaccination Rates: Australia")
print(ggp)

I think my problem lies within the following two lines, I know I need to assign by.x= and/or by.y=: but I keep getting errors that I'm unclear where they originate. I'm not sure what I'm trying to achieve here...

postcode.df <- join(postcode.df, postcode@data, by="id")
postcode.df <- merge(postcode.df, vac.data, all=TRUE)

My shapefile ends up with over 5,500,000 observations at this point and R starts to struggle.

Its also worth noting that there are some postcodes in the ABS shapefile for which I have no data. I'm not sure how to exclude them. They may be an issue. In a previous attempt I tried this approach:

library("sp","rgdal","plyr")
setwd("~/Google Drive/R/PC_Shapes")
ogrListLayers("POA06aAUST_region.shp")
postcode <- readOGR("POA06aAUST_region.shp", layer="POA06aAUST_region")
vacs <- read.csv("~/Google Drive/R/PC_Shapes/Postcode2013.csv")
PNI <- melt(vacs, id=c("Postcode","Percent.not.fully.immunised"))
postcode$POA_2006 %in% PNI$Postcode
postcode$POA_2006[which(!postcode$POA_2006 %in% PNI$Postcode)] 
levels(postcode$POA_2006[which(!postcode$POA_2006 %in% PNI$Postcode)] )

If anyone has any idea where I'm falling over, I'd much appreciate any tips. I'm new to R so apologies if this is an obvious question.

Was it helpful?

Solution

Lots of stuff wrong here. The read.csv line... sep=",", not " ". Gotta make sure you are merging on the right columns. Use head(df) to see the first couple rows of the df you are trying to merge, or str(df) to view a bunch of info about it.

Good luck man.

library(rgdal)
library(maptools) 
library(ggplot2)
library(plyr)
gpclibPermit()

vac.data <- read.csv(file = "Postcode2013.csv", header=TRUE, sep=",", na.string="NA", dec=".", strip.white=TRUE)
postcode <- readOGR("POA06aAUST_region.shp", layer="POA06aAUST_region")
# took too long to fortify on whole data set
postcode <- postcode[1:50,]
postcode@data$id <- rownames(postcode@data)
pts <- fortify(postcode,region="id")
postcode.df <- merge(pts,postcode,by="id", stringsAsFactors=F)
postcode.df$id <- as.numeric(postcode.df$id)
postcode.df2 <- merge(postcode.df, vac.data, by.x="POA_2006", by.y="PC_2006")
postcode.df2 <- postcode.df2[order(postcode.df2$id,postcode.df2$order),]

ggplot() + geom_polygon(aes(x=long,y=lat, group=group, 
                            fill=Percent.not.fully.immunised),
                        data=postcode.df2)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top