Question

My goal is to compare the distribution of various socioeconomic factor such as income over multiple years to see how the population has evolved in particular region in say, over 5 years. The primary data for this comes from the Public Use Microdata Sample. I am using R + ggplot2 as my preferred tool.

When comparing two years worth of data (2005 and 2010) I have two data frames hh2005 and hh2010 with the household data for the two years. The income data for the two years are stored in the variable hincp in both data frames. Using ggplot2 I am going about creating the density plot for individual years as follows (example for 2010):

    p1 <- ggplot(data = hh2010, aes(x=hincp))+
      geom_density()+
      labs(title = "Distribution of income for 2010")+
      labs(y="Density")+
      labs(x="Household Income")
    p1 

How do I overlay the 2005 density over this plot? I am unable to figure it out as having read data in as hh2010 I am not sure how to proceed. Should I be processing the data in a fundamentally different way from the very beginning?

Was it helpful?

Solution

You can pass data arguments to individual geoms, so you should be able to add the second density as a new geom like this:

p1 <- ggplot(data = hh2010, aes(x=hincp))+
  geom_density() +
  # Change the fill colour to differentiate it
  geom_density(data=hh2005, fill="purple") +
  labs(title = "Distribution of income for 2010")+
  labs(y="Density")+
  labs(x="Household Income")

OTHER TIPS

This is how I would approach the problem:

  1. Tag each data frame with the variable of interest (in this case, the year)
  2. Merge the two data sets
  3. Update the 'fill' aesthetic in the ggplot function

For example:

# tag each data frame with the year^
hh2005$year <- as.factor(2005)
hh2010$year <- as.factor(2010)

# merge the two data sets
d <- rbind(hh2005, hh2010)
d$year <- as.factor(d$year)

# update the aesthetic
p1 <- ggplot(data = d, aes(x=hincp, fill=year)) +
  geom_density(alpha=.5) +
  labs(title = "Distribution of income for 2005 and 2010") +
  labs(y="Density") +
  labs(x="Household Income")
p1

^ Note, the 'fill' parameter seems to work best when you use a factor, thus I defined the years as such. I also set the transparency of the overlapping density plots with the 'alpha' parameter.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top