I have been searching Stackoverflow for hours hoping to find something I guessed was self-evident but nobody seemed to have asked (which might mean it is indeed self-evident).
I want to use tapply
or by
, to find the first time a specific event occurs in a dataframe (first non-zero value). The way I did this before was via
max.col(df, ties.method = c("first"))
But somehow this does not work when used in conjunction with either tapply or by. Here's some examplary data
FIRM<-as.vector(sample(c("a","b","c","d"),100,replace=T))
MOMENT<-as.vector(sample((1990:1995),100,replace=T))
EVENT<-as.vector(sample(c("x12","x43","x35","y71","y81","xy1","xy67","yy123","xx901"),100,replace=T))
OCCURENCE<-as.vector(sample(c(0,1),100,replace=T))
m<-as.data.frame(cbind(FIRM,MOMENT,EVENT,OCCURENCE))
So here is what I tried and did not work
tapply(m[,4],m[,3],max.col)
# This gives just 1s for every EVENT with the length of the resulting vector equal to number of EVENTs mentioned in the dataset
tapply(m[,4],m[,3],max.col(m, ties.method=c("first")))
# Error in match.fun(FUN) :
'max.col(m, ties.method = c("first"))' is not a function, character or symbol
In addition: Warning message: In max.col(m, ties.method = c("first")) : NAs introduced by coercion
Number 2 is really the crux of the problem. For reasons unclear to me, max.col is not recognised as a function once you change the default tie-breaking method (i.e. "random") to to one I need (i.e. "first").
Additionally, I'd want to be able to find the year in which the non-zero occurs.
I think a sensible alternative would be to multiply the MOMENT column with the OCCURENCE column (call that ID) and look for the first non-zero value in ID (for each factor EVENT) keep that ID value and turn the other values into zero
m$MOMENT<-as.numeric(as.character(m$MOMENT))
m$OCCURENCE<-as.numeric(as.character(m$OCCURENCE))
m[,"ID"]<-m$MOMENT * m$OCCURENCE
I have tried to code this with a function containing a when
and if
statement and using break
but it does not work
tapply(m$ID,m$EVENT, function(x) m$ID[i]<- while (m$ID[i] == 0) {m$ID[i]
if (m$ID[i]>0) {m$YEAR[i] && break }})
The idea here was to iterate the function over EVENT while m$ID == 0 and then to change the value and break once m$ID > 0. Didn't work...
Any ideas on how to fix this (or much simpler solutions)?