If the data is unbalanced, examples such as askesis_rea's answer and G. Grothendieck's answer do not apply directly (Note: I did not test the other answers). But they apply after expanding the dataframe to a balanced panel with NA
values.
Example
In this example, individuals are not observed across all time periods.
Let's extend the example to the unbalanced case: (I remove day 2 for apples)
set.seed(1)
mydata <- data.frame(Day=0:9 %% 5+1,
Price=rpois(10,10),
Good=rep(c("apples","oranges"), each=5))
mydata <- mydata[!(mydata$Good=="apples" & mydata$Day==2), ]# removing apples in day 2
mydata
Day Price Good
1 1 8 apples
3 3 7 apples
4 4 11 apples
5 5 14 apples
6 1 12 oranges
7 2 11 oranges
8 3 9 oranges
9 4 14 oranges
10 5 11 oranges
Running G. Grothendieck's dplyr
answer yields the wrong values:
mydata %>%
group_by(Good) %>%
mutate(P1d = Price - lag(Price)) %>%
ungroup
Indeed, for Day 3 and apples, the value should be 2 however it is -1.
This is because the difference between prices in Day 3 and 1 was computed rather than the difference between Day 3 and 2.
# A tibble: 9 × 4
Day Price Good P1d
<dbl> <int> <chr> <int>
1 1 8 apples NA
2 3 7 apples -1
3 4 11 apples 4
4 5 14 apples 3
5 1 12 oranges NA
6 2 11 oranges -1
7 3 9 oranges -2
8 4 14 oranges 5
9 5 11 oranges -3
But if we expand first, and then apply first differencing, we get the right results:
library(tidyr)
expanded <- mydata %>% complete(nesting(Good), Day=full_seq(Day, 1))
expanded %>%
group_by(Good) %>%
mutate(P1d = Price - lag(Price)) %>%
ungroup
# A tibble: 10 × 4
Good Day Price P1d
<chr> <dbl> <int> <int>
1 apples 1 8 NA
2 apples 2 NA NA
3 apples 3 7 NA
4 apples 4 11 4
5 apples 5 14 3
6 oranges 1 12 NA
7 oranges 2 11 -1
8 oranges 3 9 -2
9 oranges 4 14 5
10 oranges 5 11 -3
Edit
In some other scenario of varying composition of individuals across time this method might not be appropriate.
One good tool is fixest::d operator.
It can be used in fixest model formula or with data.table.
Example with data.table
- Unbalanced
library(data.table)
# creating the data
set.seed(1)
mydata <- data.frame(Day=0:9 %% 5+1,
Price=rpois(10,10),
Good=rep(c("apples","oranges"), each=5))
mydata <- mydata[!(mydata$Good=="apples" & mydata$Day==2), ]# removing apples in day 2
mydata <- fixest::panel(as.data.table(mydata), panel.id=~Good + Day)
mydata[, P1D:=fixest::d(Price)] # Adding inplace first difference
as.data.frame(fixest::unpanel(mydata)) # viewing
Day Price Good P1D
1 1 8 apples NA
2 3 7 apples NA
3 4 11 apples 4
4 5 14 apples 3
5 1 12 oranges NA
6 2 11 oranges -1
7 3 9 oranges -2
8 4 14 oranges 5
9 5 11 oranges -3
- Balanced
set.seed(1)
MyData <- data.frame(Day=0:9 %% 5+1,
Price=rpois(10,10),
Good=rep(c("apples","oranges"), each=5))
MyData <- fixest::panel(as.data.table(MyData), panel.id=~Good + Day)
MyData[, P1D:=fixest::d(Price)]
as.data.frame(fixest::unpanel(MyData))
[1] TRUE
Day Price Good P1D
1 1 8 apples NA
2 2 10 apples 2
3 3 7 apples -3
4 4 11 apples 4
5 5 14 apples 3
6 1 12 oranges NA
7 2 11 oranges -1
8 3 9 oranges -2
9 4 14 oranges 5
10 5 11 oranges -3