Question

I am having issues thinking of a way around this geographic mapping problem in ggplot2. The issue is that ggplot is not filling in data for some states and leaving them blank. This makes sense, as those states don’t have any value based on my fill.

Map

I know I could possibly add rows for those states and just fill them with 0s, but those states with no value should change over time. I am trying to build this to be automated, as in whoever does this month to month literally has to save the file and hit run, so I want this to update on its own.

In a perfect world, states with no values would be labeled differently on the axis as “no penetration”.

GGplot code:

 map<- ggplot(penetration_levels,aes(long,lat,group=region,fill=Penetration),)+geom_polygon()+coord _equal()+scale_fill_gradient2(low="red",mid="white",high="green",midpoint=.25)
map
map<-map+geom_point(
data=mydata, aes(x=long, y=lat,group=1,fill=0, size=Annualized.Opportunity),
color="gray6") + 
scale_size(name="Total Annual Opportunity-Millions",range=c(2,4))  
map<-map+theme(plot.title = element_text(size = 12,face="bold"))
map

Head of my data and penetration

head(mydata)
Sold.To.Customer            City State Annualized.Opportunity           location          lat      long
21          10000110        NEW YORK    NY              12.142579        NEW YORK,NY     40.71435 -74.00597
262         10016487 FORT LAUDERDALE    FL              12.087310 FORT LAUDERDALE,FL 26.12244 -80.13732
349         11001422      ALLEN PARK    MI              10.910575      ALLEN PARK,MI 42.25754 -83.21104
19          10000096           ALTON    IL              10.040067           ALTON,IL 38.89060 -90.18428
477         11067228        BAY CITY    TX              10.030829        BAY CITY,TX 28.98276 -95.96940
230         10014909        BETHPAGE    NY               9.320271        BETHPAGE,NY 40.74427 -73.48207
head(penetration_levels)
State  region      long      lat group order subregion state       To     From    Total    Penetration
17    AL alabama -87.46201 30.38968     1     1      <NA>    AL 10794947 12537359 23332307    0.462661
18    AL alabama -87.48493 30.37249     1     2      <NA>    AL 10794947 12537359 23332307    0.462661
22    AL alabama -87.52503 30.37249     1     3      <NA>    AL 10794947 12537359 23332307    0.462661
36    AL alabama -87.53076 30.33239     1     4      <NA>    AL 10794947 12537359 23332307    0.462661
37    AL alabama -87.57087 30.32665     1     5      <NA>    AL 10794947 12537359 23332307    0.462661
65    AL alabama -87.58806 30.32665     1     6      <NA>    AL 10794947 12537359 23332307    0.462661

merge:

#geocode
geocode<-geocode(mydata$location)
mydata$lat<-geocode$lat
mydata$long<-geocode$lon
#create us map and graph
states<-map_data("state")
#merge states
states<-merge(states,statelookup,by="region")
penetration_levels<-merge(states,penetration_levels,by="State")
penetration_levels<- penetration_levels[order(penetration_levels$order), ]

Then it goes directly into map plot

Was it helpful?

Solution

So this turns out to be a common problem. Generally choropleth maps require some sort of merge of the map data with the dataset containing the information used to set the polygon fill colors. In OP's case this is done as follows:

states <- map_data("state")
states <- merge(states,statelookup,by="region")
penetration_levels <- merge(states,penetration_levels,by="State")

The problem is that, if penetration_levels has any missing States, these rows will be excluded from the merge (in database terminology, this is an inner join). So in rendering the map, those polygons will be missing. The solution is to use:

penetration_levels <- merge(states,penetration_levels,by="State",all.x=T)

This returns all rows of the first argument (the "x" argument), merged with any data from matching states in the second argument (a left join). Missing values are set to NA.

The fill color of polygons (states) with NA values is set by default to grey50, but can be changed by adding the following call to the plot definition:

scale_fill_gradient(na.value="red")

OTHER TIPS

Couldn't you add a check for missing states and add rows (with zero for penetration) for them to your data frame? A simple example:

# Create a generic data frame with zeros for penetration
zeros.data = data.frame(State=as.character(state.abb), penetration=0)

# Create a simplified analogue of your data
penetration_levels = data.frame(State=as.character(state.abb[1:30]), 
                                penetration=runif(30,0.1,1))

# Get values for missing states
missing.states = setdiff(state.abb, unique(penetration_levels$State))

# Get required data for missing states.
penetration_levels = rbind(penetration_levels,
                           zeros.data[zeros.data$State %in% missing.states,])

You could do a check like this before running your plotting code to automatically fill out your data frame with zero penetration for all missing states (and of course your "zeros.data" data frame would have to include the other columns in your original data frame, filled with NAs or with whatever data you need for plotting.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top