Question

I am facing a problem with a dataset which has overlapping factor levels.

I would like to produce timelines, barplots and statistics by factor level - however, I want the factor levels to be equivocal. That means that observations belonging to more than one level should appear several times in a plot.

Here is an example of how my data structure looks like:

head <- c("ID","YEAR","BRAZIL","GERMANY","US","FRANCE")
data <- data.frame(matrix(c(1,2000,1,0,0,0,
                            2,2010,0,1,1,0,
                            3,2011,0,1,0,0,
                            4,2012,1,0,0,1,
                            5,2012,0,1,0,0,
                            6,2013,0,0,0,1), 
                         nrow=6, ncol=6, byrow=T))
names(data) <- head

Obiously, a possible factor variable "COUNTRY" cannot be created the usual way. It would force factor levels to be clear-cut (in our case there would be 4 levels: Brazil, Germany, US and France):

data$COUNTRY[data$BRAZIL==1 & 
             data$GERMANY==0 & 
             data$US==0 & 
             data$FRANCE==0]  <- "Brazil"
data$COUNTRY[data$BRAZIL==0 & 
             data$GERMANY==1 & 
             data$US==0 & 
             data$FRANCE==0]  <- "Germany"

etc...

factor(data$COUNTRY)

But this is not what, I want...


My problem is that plotting by factor only works if factor levels are properly unambiguous. I would like to produce something like this:

require(ggplot2)
MYPLOT <- qplot(data$YEAR, data$COUNTRY)
MYPLOT + geom_point(aes(size=..count..), stat="bin") + scale_size(range=c(0, 15)) 

with observations belonging to i factor levels to appear i times in the plot.

  • How should I transform my data.frame in order to get what I desire?
  • Should I simply duplicate those observations belonging to i factor levels i times? If yes, how should I do that?
  • Is a workaround which does not require case duplications?

Ideas anyone?

Was it helpful?

Solution

I think you have to duplicate those rows to represent each observation. and remove any with 0.

library(reshape2)
d2<-melt(data, id.var=c("ID","YEAR"))
d3<-d2[d2$value!=0,]
library(ggplot2)
qplot(d3$YEAR, d3$variable)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top