Create stacked area graph from time data

Question

EDITs, trying to address comments:

`rownames<-`(
  as.data.frame(lapply(df[-1], function(x) as.numeric(table(x)))), 
  paste("Clust ", 0:3)
)

Produces:

         T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
Clust  0  4  3  5  8 11  6  2  4  5   7
Clust  1  5  9  8  6  3  7  7  8  7   4
Clust  2  5  6  2  3  3  3  2  3  4   4
Clust  3  6  2  5  3  3  4  9  5  4   5

This counts the # of occurrences of each cluster type (0:3) at each time period using table. The key piece of code is the lapply(...). The stuff around it is just so it displays pretty.

With data:

set.seed(1)
labels <- paste("Clust ", 0:3)
df <- as.data.frame(c(list(ID=1:20), setNames(replicate(10, factor(sample(0:3, 20, rep=T)), simplify=F), paste0("T", 1:10))))

Here is a ggplot solution. First you need to get the data into long format with melt from the reshape2 package, you can then aggregate it (optionally re-cast it), and then plot it:

library(reshape2)
library(ggplot2)
df.mlt <- melt(df, id.vars="ID")
df.agg <- aggregate(. ~ ID + variable, df.mlt, sum)
dcast(df.agg, ID ~ variable)  # just for show, we don't use the result anyplace

#   ID T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
# 1  0 25 18 29 23 16 15 14 22 29  19
# 2  1  7  7 14 18 19 11 21 17 15  22
# 3  2 16 15 16 20 23 20 16 13 15  12
# 4  3 14 13 20 17 25 14 13  7 21  24

ggplot(df.agg) +
  geom_area(aes(x=variable, y=value, fill=ID, group=ID), position="fill")

enter image description here

It takes a little getting used to ggplot, but once you do get used to it is mostly intuitive. You should look at the result of melt(df, id.vars="ID") to see what I mean by "long format" first. Then, in this case, we use geom_area, and specify as "aesthetics" (values that change with the data) in aes the x value (variable is a name produced by melt, in this case it contains the time values), the y value (value is also created by melt), and also specify that the color of the fill of our areas should be derived from the ID. Note that because the time we're using here is categorical (T1, T2, etc., instead of actual dates), we must use group in addition to fill so that ggplot knows that you want points in different times to be connected.

Note you do not need to do the aggregation step ahead of plotting. ggplot can handle it internally. The following command is equivalent (note how we're using df.mlt):

ggplot(df.mlt) +
  stat_summary(aes(x=variable, y=value, fill=ID, group=ID), fun.y=sum, position="fill", geom="area")

This is the data I used:

df <- as.data.frame(c(list(ID=rep(factor(0:3), 3)), setNames(replicate(10, sample(1:10, 12, rep=T), simplify=F), paste0("T", 1:10))))