
I thought it would be simple, but I got stuck and I would appreciate your help.

In my data there are 4 questions with Yes/No answers and they occur within weekly periods. The week periods have been numbered in dtf$Week. What I need to do is to create a plot with weekly count of "Yes" answers on y axis and week number on x axis. The Yes scores should be represented with 4 lines in differing colours. Formatting is easy, but understanding how to properly summarize data unfortunately is not. I am only learning R.

> str(dtf)
'data.frame':   55 obs. of  7 variables:
 $ id       : num  7 8 9 10 11 12 13 16 17 18 ...
 $ q_0001   : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 2 1 1 1 ...
 $ q_0002   : Factor w/ 2 levels "Yes","No": 2 1 1 1 2 2 2 2 2 2 ...
 $ q_0003   : Factor w/ 2 levels "Yes","No": 2 2 2 1 2 2 2 2 2 2 ...
 $ q_0004   : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 2 2 2 2 ...
 $ Assm_Date: Date, format: "2014-01-04" "2014-01-08" ...
 $ Week     : num  1 1 1 1 1 1 1 2 2 2 ...

The dataset can be reproduced with:

# sample dataset
Start <- as.Date("2014-01-03")
class(Dates) <- "Date"
for (i in 1:8) {
     End <- as.Date(Start+6)
     Samp <- Start + sort(sample.int(End-Start, 7,replace=TRUE))
     Dates <- append (Dates, Samp)
     Start <- End + 1
     i = i +1
dtf$Week = 1 + as.numeric(dtf$Date - as.Date("2014-01-03")) %/% 7

If need be a link to the test dataset also is here. Sorry I did not make the test dataset right away, I am a bit new to this.

Here is a ggplot solution. You can use ggplots baked in statistics capabilities, which work great except in this case require a little bit of messing around to get what you're looking for (the xlim and coord_cartesian business):

dtf.mlt <- melt(dtf[-(1:2)], id.vars="Week")  # data in long format    
ggplot(subset(dtf.mlt, value=="Yes")) + 
  stat_bin(aes(x=Week, color=variable), geom="line", position="identity", binwidth=1) +
  ylim(c(0, 7)) + xlim(1, 9) + coord_cartesian(xlim=c(0, 9))

If you're willing to calculate stats yourself, then it becomes easier on the plot side. Here we use aggregate to count the number of yeses for each week/variable combo:

dtf.agg <- aggregate(dtf.mlt$value, dtf.mlt[c("Week", "variable")], FUN=function(x) sum(x == "Yes"))
ggplot(dtf.agg) + geom_line(aes(x=Week, y=x, color=variable))

Notice that by pre-calculating the data we didn't have to mess around as much to get the plot to look like what we want it to (including the x scales).

@ R0berts : Is it the structure of dtf2 (str(dft2) or the way it is calculated (dtf2 <- ddply(....) ?

Here is another way to look at it:


df = data.table(df) # convert to data.table

# convert to long format
df2 = melt(df,id.vars = c("Week"), measure.vars = c("q_0001","q_0002","q_0003","q_0004"))

# get counts
df2 = df2[,sum(value == "Yes"),by = list(Week,variable)]
