How do I Difference Panel Data in R

Question 1

ave

transform(MyData, P1d = ave(Price, Good, FUN = function(x) c(NA, diff(x))))

ave/gsubfn

The last solution can be shorteneed slightly using fn$ in the gsubfn package:

library(gsubfn)
transform(MyData, P1d = fn$ave(Price, Good, FUN = ~ c(NA, diff(x))))

dplyr

library(dplyr)

MyData %>% 
  group_by(Good) %>% 
  mutate(P1d = Price - lag(Price)) %>% 
  ungroup

data.table

library(data.table)

dt <- data.table(MyData)
dt[, P1d := c(NA, diff(Price)), by = Good]

Update

dplyr now uses %>% instead of %.% .

Question 2

One can easily do it like this:

library(reshape2)
library(dplyr)

MyNewData <- 
 MyData %.%
 melt(id = c("Good", "Day")) %.%
 dcast(Day ~ Good) %.%
 mutate(apples  = apples - lag(apples),
     oranges = oranges - lag(oranges)) %.%
 melt(id = "Day", variable.name = "Good", value.name = "P1d") %.%
 merge(MyData) %.%
 arrange(Good, Day)

Regards

Question 3

In my case, I had to generate first differences for a panel. In order to have the differenced vector with the same length I used diff with a NA.

library(dplyr)
mydata %>%
group_by(id) %>%
mutate(price_diff = c(NA, diff(price)))%>%
ungroup()

Question 4

This is what I came up with. But it does not seem efficient in the least:

MyData$P1d <- c(NA, MyData$Price[-1]-MyData$Price[-nrow(MyData)])
MyData$P1d[c(F,MyData$Good[-1]!=MyData$Good[-nrow(MyData)])] <- NA

MyData

   Day Price    Good P1d
1    1     8  apples  NA
2    2    10  apples   2
3    3     7  apples  -3
4    4    11  apples   4
5    5    14  apples   3
6    1    12 oranges  NA
7    2    11 oranges  -1
8    3     9 oranges  -2
9    4    14 oranges   5
10   5    11 oranges  -3

Question 5

I came up with this code following some dpylr tutorials online: My goal was to add new column that is 5 year growth rate of R & D (variable xrd) of firms (identifer GVKEY). RandD2015 is raw data file.

By using the piping function (%>%) you can combine multiple calls in dplyr. Hope this is useful (my 1st code contribution in Stack Overflow)

library(dplyr)

RandDtest<- RandDec2015 %>% 
    group_by(GVKEY) %>%
    mutate(xrd5yr=xrd/lag(xrd,4)-1)

Question 6

collapse::fdiff is the function you are looking for:

library(collapse)
# This means compute difference of Price lagged once, iterated once, by Good, ordered by Day
settransform(MyData, P1d = fdiff(Price, 1, 1, Good, Day))

Question 7

If the data is unbalanced, examples such as askesis_rea's answer and G. Grothendieck's answer do not apply directly (Note: I did not test the other answers). But they apply after expanding the dataframe to a balanced panel with NA values.

Example

In this example, individuals are not observed across all time periods.

Let's extend the example to the unbalanced case: (I remove day 2 for apples)

set.seed(1)
mydata <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))
mydata <- mydata[!(mydata$Good=="apples" & mydata$Day==2), ]# removing apples in day 2
mydata

   Day Price    Good
1    1     8  apples
3    3     7  apples
4    4    11  apples
5    5    14  apples
6    1    12 oranges
7    2    11 oranges
8    3     9 oranges
9    4    14 oranges
10   5    11 oranges

Running G. Grothendieck's dplyr answer yields the wrong values:

mydata %>% 
  group_by(Good) %>% 
  mutate(P1d = Price - lag(Price)) %>% 
  ungroup

Indeed, for Day 3 and apples, the value should be 2 however it is -1. This is because the difference between prices in Day 3 and 1 was computed rather than the difference between Day 3 and 2.

# A tibble: 9 × 4
    Day Price Good      P1d
  <dbl> <int> <chr>   <int>
1     1     8 apples     NA
2     3     7 apples     -1
3     4    11 apples      4
4     5    14 apples      3
5     1    12 oranges    NA
6     2    11 oranges    -1
7     3     9 oranges    -2
8     4    14 oranges     5
9     5    11 oranges    -3

But if we expand first, and then apply first differencing, we get the right results:

library(tidyr)
expanded <- mydata %>%  complete(nesting(Good), Day=full_seq(Day, 1))

expanded %>% 
  group_by(Good) %>% 
  mutate(P1d = Price - lag(Price)) %>% 
  ungroup

# A tibble: 10 × 4
   Good      Day Price   P1d
   <chr>   <dbl> <int> <int>
 1 apples      1     8    NA
 2 apples      2    NA    NA
 3 apples      3     7    NA
 4 apples      4    11     4
 5 apples      5    14     3
 6 oranges     1    12    NA
 7 oranges     2    11    -1
 8 oranges     3     9    -2
 9 oranges     4    14     5
10 oranges     5    11    -3

Edit

In some other scenario of varying composition of individuals across time this method might not be appropriate. One good tool is fixest::d operator. It can be used in fixest model formula or with data.table.

Example with data.table

Unbalanced

library(data.table)
# creating the data
set.seed(1)
mydata <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))
mydata <- mydata[!(mydata$Good=="apples" & mydata$Day==2), ]# removing apples in day 2

mydata <- fixest::panel(as.data.table(mydata), panel.id=~Good + Day)
mydata[, P1D:=fixest::d(Price)] # Adding inplace first difference 

as.data.frame(fixest::unpanel(mydata)) # viewing

  Day Price    Good P1D
1   1     8  apples  NA
2   3     7  apples  NA
3   4    11  apples   4
4   5    14  apples   3
5   1    12 oranges  NA
6   2    11 oranges  -1
7   3     9 oranges  -2
8   4    14 oranges   5
9   5    11 oranges  -3

Balanced

set.seed(1)
MyData <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))

MyData <- fixest::panel(as.data.table(MyData), panel.id=~Good + Day)
MyData[, P1D:=fixest::d(Price)]
as.data.frame(fixest::unpanel(MyData))

[1] TRUE
   Day Price    Good P1D
1    1     8  apples  NA
2    2    10  apples   2
3    3     7  apples  -3
4    4    11  apples   4
5    5    14  apples   3
6    1    12 oranges  NA
7    2    11 oranges  -1
8    3     9 oranges  -2
9    4    14 oranges   5
10   5    11 oranges  -3