Plot frequencies of factor variable

https://stackoverflow.com/questions/23668759

23-07-2023
|

Question

I'm trying to get a handle on all of the various tools for manipulating data structures - I've looked into apply, sapply, tapply, reshape, etc. and I still feel very unsure about which to use in each situation.

For my current problem, I have data that looks like:

ID    T1Measure    T2Measure    ...
1     1            1
2     1            2
...

where T1Measure represents the measure of a factor/categorical variable at time 1, T1Measure is the measure of the same variable for the same user at time 2, etc.

My goal is to produce graphs of how the distribution of this measure changes over time (both the frequency of each factor and the proportion of each factor).

I know this is simple, but I'm having a hard time wrapping my head around how I can get what I want.

I believe that for ggplot, I want something like:

FactorID     variable     value
1             T1           2
2             T1           0
1             T2           1
2             T2           1
...

I want to know which package I should be looking at to do this, but more generally, a good way of thinking about data structures, and how to recognize the best way to manipulate them.

Solution

I'm not sure I would use any apply statements here, but the reshape2 package would help.

#sample data
dd<-data.frame(
    ID=c(1,2,3,4,5,6),
    T1=c(1,2,2,1,1,2),
    T2=c(1,1,2,1,1,2),
    T3=c(2,1,1,2,1,1)
)

library(reshape2)
mm<-melt(dd,id.vars="ID", variable.name="Measure", value.name="FactorID")

#option 1 (useful for counts of discrete values)
as.data.frame(with(mm, table(FactorID, Measure))

#option 2 (useful for collapsing data)
aggregate(ID~FactorID+Measure, mm, FUN=length)

I used standard base functions for collapsing the data and making counts. I tend to perfer the syntax of reshape2 to the base reshape() function but that might be able to work as well.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow