Question

I have a dataframe with a KEY/ID column, a year column, two variables V1 and V2.

KEY V1  V2  YEAR
1   10   5  1990
1   20  10  1991
1   30  15  1992   
2   40  20  1990
2   50  25  1991
2   60  30  1992

I would like to compute the percent change for the values of V1 from one year to another one. That is, I would like to compute (V1[i+1]-V1[i])/V1[i] but only when the value in KEY[i+1] is equal to the value of KEY[i]. When they are different, I would like to get a NA.

KEY V1  V2  YEAR  CHANGE
1   10   5  1990    1
1   20  10  1991    1
1   30  15  1992   NA   
2   40  20  1990    0.25
2   50  25  1991    0.2
2   60  30  1992   NA

This is my attempt by using the Delt function from the quantmode package and ddply from plyr.

data$change <- ddply(data, "data$KEY", transform,  DeltaCol=Delt(data$V1) )

Unfortunately, it doesn't do the trick.

Any help would be appreciated.

Was it helpful?

Solution

I don't know how to do it with ddply but it's pretty easy with ave:

> dat$pctchg <- ave(dat$V1, dat$KEY, FUN=function(x) c( NA, diff(x)/x[-length(x)])  )
> dat
  KEY V1 V2 YEAR pctchg
1   1 10  5 1990     NA
2   1 20 10 1991   1.00
3   1 30 15 1992   0.50
4   2 40 20 1990     NA
5   2 50 25 1991   0.25
6   2 60 30 1992   0.20

ave works when you want a result that depends only on one vector within any number of categories. As far as I know you cannot have multiple vector calculations with ave nor do you have access to the factor levels within hte function. If you want the same calculation(s) on all of a group of vectors considered separately, then aggregate is the best; and finally if you want calculations that each depend on on multiple vectors use either do.call(rbvind, by(dat ,cats, function)) or lapply( split(dat, cats), function)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top