R ggplot2 mapping issue, automate missing State info

Question 1

So this turns out to be a common problem. Generally choropleth maps require some sort of merge of the map data with the dataset containing the information used to set the polygon fill colors. In OP's case this is done as follows:

states <- map_data("state")
states <- merge(states,statelookup,by="region")
penetration_levels <- merge(states,penetration_levels,by="State")

The problem is that, if penetration_levels has any missing States, these rows will be excluded from the merge (in database terminology, this is an inner join). So in rendering the map, those polygons will be missing. The solution is to use:

penetration_levels <- merge(states,penetration_levels,by="State",all.x=T)

This returns all rows of the first argument (the "x" argument), merged with any data from matching states in the second argument (a left join). Missing values are set to NA.

The fill color of polygons (states) with NA values is set by default to grey50, but can be changed by adding the following call to the plot definition:

scale_fill_gradient(na.value="red")

Question 2

Couldn't you add a check for missing states and add rows (with zero for penetration) for them to your data frame? A simple example:

# Create a generic data frame with zeros for penetration
zeros.data = data.frame(State=as.character(state.abb), penetration=0)

# Create a simplified analogue of your data
penetration_levels = data.frame(State=as.character(state.abb[1:30]), 
                                penetration=runif(30,0.1,1))

# Get values for missing states
missing.states = setdiff(state.abb, unique(penetration_levels$State))

# Get required data for missing states.
penetration_levels = rbind(penetration_levels,
                           zeros.data[zeros.data$State %in% missing.states,])

You could do a check like this before running your plotting code to automatically fill out your data frame with zero penetration for all missing states (and of course your "zeros.data" data frame would have to include the other columns in your original data frame, filled with NAs or with whatever data you need for plotting.