How to use "with" and "tapply" to calculate a new variable based on multiple factors

StackOverflow https://stackoverflow.com/questions/23524722

  •  17-07-2023
  •  | 
  •  

Pergunta

I'm trying to obtain the mean "ctrlmeans" of the telephone handle time "Handle" of a single group "Actrl" based on another a variable "Period". I then want to create a new variable "Difference" by subtracting that mean from the "Handle" of each person in the dataframe.

Here's what I did:

> ttp1<-read.csv("ttp1.csv")

> dput(head(ttp1,12))

structure(list(NUID = structure(c(4L, 6L, 7L, 8L, 11L, 12L, 9L, 
10L, 1L, 2L, 3L, 5L), .Label = c("A000904", "A024324", "A047744", 
"A063828", "A071164", "C833344", "C833345", "C833346", "E254607", 
"Y950092", "Z952754", "Z993876"), class = "factor"), Period = c(201415L, 
201415L, 201415L, 201415L, 201415L, 201415L, 201416L, 201416L, 
201416L, 201416L, 201416L, 201416L), Queue = c(1L, 2L, 1L, 1L, 
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L), Group = structure(c(2L, 4L, 
3L, 3L, 3L, 3L, 1L, 4L, 3L, 3L, 3L, 3L), .Label = c("A", "A ", 
"ACTRL", "B"), class = "factor"), Handle = c(1013L, 699L, 425L, 
450L, 444L, 681L, 532L, 716L, 388L, 307L, 430L, 380L)), .Names = c("NUID", 
"Period", "Queue", "Group", "Handle"), row.names = c(NA, 12L), class = "data.frame")

My commands:

> ctrlmeans <- with(subset(ttp1, Group=="ACTRL"), tapply(Handle, Period, mean))

> ctrlmeans


201415 201416 
500.00 376.25 

> Difference <- ttp1$Handle-ctrlmeans[ttp1$Period]

> Difference


<NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 
  NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA 

Why would I get NA?

If I included an additional grouping variable to the tapply command "queue" how would I do this?

Foi útil?

Solução

To give you an example of how this would work with the dplyr package if you want to calculate the means of Handle by groups of Period AND Queue:

require(dplyr)

ctrlmeans <-                               #data.frame to store your results   
ttp1 %.%                                   #data.frane to use for analysis
  group_by(Period,Queue) %.%               #grouping variables (you can add/remove Queue if you like)
  filter(Group == "ACTRL") %.%             #use only rows where Group == "ACTRL"
  summarize(mean.Handle = mean(Handle))    #makes a summary column with means of Handle by group                                                                                     

ttp1 <- inner_join(ttp1,ctrlmeans,by=c("Period","Queue"))  #join the ctrlmeans to the ttp1 data frame
ttp1["Diff"] <- with(ttp1, Handle - mean.Handle)           #Add column for the differences

#>ttp1
#      NUID Period Queue Group Handle mean.Handle   Diff
#1  A063828 201415     1    A    1013       437.5  575.5
#2  C833345 201415     1 ACTRL    425       437.5  -12.5
#3  C833346 201415     1 ACTRL    450       437.5   12.5
#4  C833344 201415     2     B    699       562.5  136.5
#5  Z952754 201415     2 ACTRL    444       562.5 -118.5
#6  Z993876 201415     2 ACTRL    681       562.5  118.5
#7  E254607 201416     1     A    532       347.5  184.5
#8  A000904 201416     1 ACTRL    388       347.5   40.5
#9  A024324 201416     1 ACTRL    307       347.5  -40.5
#10 Y950092 201416     2     B    716       405.0  311.0
#11 A047744 201416     2 ACTRL    430       405.0   25.0
#12 A071164 201416     2 ACTRL    380       405.0  -25.0 

if you want to calculate only by groups of Period, just remove Queue from the filter statement and from the inner_join statement

Outras dicas

This method only works if Period is a character or a factor. Right now it's numeric, so you can change

Difference <- ttp1$Handle-ctrlmeans[as.character(ttp1$Period)]

Also this method only works with one grouping variable. With more than one, you'd probably want to perform some aggregation into a new dataset to get the group summaries, and then merge that back into the larger data.frame and do whatever transformation you need. Or you can look at more advanced data.frame manipulations packages such as plyr. But that is a different question/problem really.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top