Question

I need to create a dichtomized variable based on two factors (one hopes it's possible).

Let's say I have the data:

    d <- data.frame ( 
    agegroup = c(2,1,1,2,3,2,1,3,3,3,3,3,1,1,2,3,2,1,1,2,1,2,2,3) ,
    gender = c(2,2,2,2,2,2,1,2,1,1,1,2,1,1,2,2,1,1,1,1,2,1,1,1) , 
    hourwalking = c(0.3,0.5,1.1,1.1,1.1,1.2,1.2,1.2,1.3,1.5,1.7,1.8,2.1,2.1,2.2,2.2,2.3,2.4,2.4,3,3.1,3.1,4.3,5)        
    )

I would like to create a binary (LowWalkHrs) using the gender- and agegroup-specific median (e.g., when agegroup = 1 and gender = 1, median = 2.1 (median was found using excel)). The LowWalkHrs would be an added variable in the dataset, so the output would be:

     agegroup gender hourwalk LowWalkHrs
        2       2       0.3       1
        1       2       0.5       1
        1       2       1.1       0
        2       2       1.1       1
        3       2       1.1       1
        2       2       1.2       0
        1       1       1.2       1
          ....
        3       1       5         0

I have a rather large dataset (~10k observations), so Excel is out of the question.

In R I've tried cut and cut2, which doesn't seem to take factor variables, as well ddply, which gave me an error message of (Error in $<-.data.frame(*tmp*, "lowWalkHrs", value = list(hourwalking = c(0.63, : replacement has 949 rows, data has 11303).

Was it helpful?

Solution

I suspect this might be slow, but I think it works:

z <- mapply(d$agegroup, d$gender, d$hourwalking, FUN=function(a,g,h)
    as.numeric(h < median(d$hourwalking[d$agegroup==a & d$gender==g])) )

OTHER TIPS

d <- data.frame ( 
    agegroup = c(2,1,1,2,3,2,1,3,3,3,3,3,1,1,2,3,2,1,1,2,1,2,2,3) ,
    gender = c(2,2,2,2,2,2,1,2,1,1,1,2,1,1,2,2,1,1,1,1,2,1,1,1) , 
    hourwalking = c(0.3,0.5,1.1,1.1,1.1,1.2,1.2,1.2,1.3,1.5,1.7,1.8,2.1,2.1,2.2,2.2,2.3,2.4,2.4,3,3.1,3.1,4.3,5)        
   )

d$LowWalkHrs=1*with(d,hourwalking<ave(hourwalking,list(factor(agegroup,exclude=NULL),factor(gender,exclude=NULL)),FUN=median))

factor(...,exclude=NULL) added for treating NA's as separate group.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top