Pregunta

How do I apply a function that can "see" the preceding result when operating by rows?

This comes up a lot, but my current problem requires a running total by student that resets if the total doesn't get to 5.

Example Data:

> df

row   Student Absent Consecutive.Absences                             
1        A       0                    0                              
2        A       1                    1                              
3        A       1                    2                              
4        A       0                    0 <- resets to zero if under 5
5        A       0                    0                              
6        A       1                    1                              
7        A       1                    2                              
8        A       1                    3                              
9        B       1                    1 <- starts over for new factor (Student)
10       B       1                    2                              
11       B       0                    0                              
12       B       1                    1                              
13       B       1                    2                              
14       B       1                    3                              
15       B       1                    4                              
16       B       0                    0                              
17       B       1                    1                              
18       B       1                    2                              
19       B       1                    3                              
20       B       1                    4                              
21       B       1                    5                              
22       B       0                    5 <- gets locked at 5
23       B       0                    5                              
24       B       1                    6                              
25       B       1                    7             

I've tried doing this with a huge matrix of shifted vectors.

I've tried doing this with the apply family of functions and half of them do nothing, the other half hit 16GB of RAM and crash my computer.

I've tried straight looping and it takes 4+ hours (it's a big data set)

What bothers me is how easy this is in Excel. Usually R runs circles around Excel both in speed and writability, which leads me to believe I'm missing something elementary here.

Forgetting even the more challenging ("lock at 5") feature of this, I can't even get a cumsum that resets. There is no combination of factors I can think of to group for ave like this:

Consecutive.Absences = ave(Absent, ..., cumsum)

Obviously, grouping on Student will just give the Total Cumulative Absences -- it "remembers" the kid's absence over the gaps because of the split and recombine in ave.

So as I said, the core of what I don't know how to do in R is this:
How do I apply a function that can "see" the preceding result when operating by rows?

In Excel it would be easy:

C3 = IF($A3=$A2,$B3+$C2,$B3)*$B3

This excel function is displayed without the 5-absence lock for easy readability.

Once I figure out how to apply a function that looks at previous results of the same function in R, I'll be able to figure out the rest.

Thank you in advance for your help--this will be very useful in a lot of my applications!

Genuinely, Sam


UPDATE:
Thank you everyone for the ideas on how to identify if a student has 5 consecutive absences!

However, that's easy enough to do in the database at the STUDENTS table. What I need to know is the number of consecutive absences by student in the attendance record itself for things like, "Do we count this particular attendance record when calculating other summary statistics?"

¿Fue útil?

Solución

If you're looking to apply a function to every element in a vector while making use the previous element's value, you might want to check out "Reduce", with the accumulate parameter set to True

Here's an example:

##define your function that takes two parameters
##these are the 'previous' and the 'current' elements
runSum <- function(sum, x){
    res = 0
    if (x == 1){
        res = sum + 1
    }
    else if (x == 0 & sum < 5){
        res = 0
    }
    else{
        res = sum
    }
    res
}

#lets look at the absent values from subject B
x = c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1)

Reduce(x=x, f=runSum, accumulate=T) 
# [1] 1 2 0 1 2 3 4 0 1 2 3 4 5 5 5 6 7

Otros consejos

It's fairly easy to identify the students with one or more runs of 5:

 tapply(dfrm$Absent, dfrm$Student, function(x) rle(x)$value[rle(x)$length >=5] )
$A
integer(0)

$B
[1] 1

Look for any values of "1" in the result:

tapply(dfrm$Absent, dfrm$Student, function(x) 1 %in% rle(x)$value[rle(x)$length >=5] )
    A     B 
FALSE  TRUE 

I also struggled through to a Reduce solution (but am second in priority to @kithpradhan):

ave(dfrm$Absent, dfrm$Student, 
           FUN= function(XX) 
              Reduce(function(x,y) if( x[1] >= 5){  y+x[1]
                                           } else{ x[1]*y+y } ,  #Resets to 0 if y=0
                 XX, accumulate=TRUE) 
    )
#[1] 0 1 2 0 0 1 2 3 1 2 0 1 2 3 4 0 1 2 3 4 5 5 5 6 7

For the record, you can also create your own Reduce-derivative which receives f and x, and applies f(x) on its output until x == f(x) or maxiter is reached:

ireduce = function(f, x, maxiter = 50){
  i = 1
  while(!identical(f(x), x) & i <= maxiter) {x = f(x); i = i+1}; x
}
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top