How to plot summary values against week numbers

https://stackoverflow.com/questions/22689361

22-06-2023
|

Question

I thought it would be simple, but I got stuck and I would appreciate your help.

In my data there are 4 questions with Yes/No answers and they occur within weekly periods. The week periods have been numbered in dtf$Week. What I need to do is to create a plot with weekly count of "Yes" answers on y axis and week number on x axis. The Yes scores should be represented with 4 lines in differing colours. Formatting is easy, but understanding how to properly summarize data unfortunately is not. I am only learning R.

> str(dtf)
'data.frame':   55 obs. of  7 variables:
 $ id       : num  7 8 9 10 11 12 13 16 17 18 ...
 $ q_0001   : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 2 1 1 1 ...
 $ q_0002   : Factor w/ 2 levels "Yes","No": 2 1 1 1 2 2 2 2 2 2 ...
 $ q_0003   : Factor w/ 2 levels "Yes","No": 2 2 2 1 2 2 2 2 2 2 ...
 $ q_0004   : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 2 2 2 2 ...
 $ Assm_Date: Date, format: "2014-01-04" "2014-01-08" ...
 $ Week     : num  1 1 1 1 1 1 1 2 2 2 ...

The dataset can be reproduced with:

# sample dataset
Start <- as.Date("2014-01-03")
Dates<-vector()
class(Dates) <- "Date"
i=1
for (i in 1:8) {
     End <- as.Date(Start+6)
     Samp <- Start + sort(sample.int(End-Start, 7,replace=TRUE))
     Dates <- append (Dates, Samp)
     Start <- End + 1
     i = i +1
}
id<-sort(sample(1:56))
q_0001<-sample(c("Yes","No"),56,replace=TRUE)
q_0002<-sample(c("Yes","No"),56,replace=TRUE)
q_0003<-sample(c("Yes","No"),56,replace=TRUE)
q_0004<-sample(c("Yes","No"),56,replace=TRUE)
dtf<-data.frame(id,Dates,q_0001,q_0002,q_0003,q_0004)
dtf$Week = 1 + as.numeric(dtf$Date - as.Date("2014-01-03")) %/% 7
rm(i,Dates,Start,End,Samp,id,q_0001,q_0002,q_0003,q_0004)

If need be a link to the test dataset also is here. Sorry I did not make the test dataset right away, I am a bit new to this.

Solution

Here is a ggplot solution. You can use ggplots baked in statistics capabilities, which work great except in this case require a little bit of messing around to get what you're looking for (the xlim and coord_cartesian business):

library(ggplot2)
library(reshape2)
dtf.mlt <- melt(dtf[-(1:2)], id.vars="Week")  # data in long format    
ggplot(subset(dtf.mlt, value=="Yes")) + 
  stat_bin(aes(x=Week, color=variable), geom="line", position="identity", binwidth=1) +
  ylim(c(0, 7)) + xlim(1, 9) + coord_cartesian(xlim=c(0, 9))

enter image description here

If you're willing to calculate stats yourself, then it becomes easier on the plot side. Here we use aggregate to count the number of yeses for each week/variable combo:

dtf.agg <- aggregate(dtf.mlt$value, dtf.mlt[c("Week", "variable")], FUN=function(x) sum(x == "Yes"))
ggplot(dtf.agg) + geom_line(aes(x=Week, y=x, color=variable))

enter image description here

Notice that by pre-calculating the data we didn't have to mess around as much to get the plot to look like what we want it to (including the x scales).

OTHER TIPS

@ R0berts : Is it the structure of dtf2 (str(dft2) or the way it is calculated (dtf2 <- ddply(....) ?

Here is another way to look at it:

library(data.table)
library(reshape2)

df = data.table(df) # convert to data.table

# convert to long format
df2 = melt(df,id.vars = c("Week"), measure.vars = c("q_0001","q_0002","q_0003","q_0004"))

# get counts
df2 = df2[,sum(value == "Yes"),by = list(Week,variable)]

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow