Question

I am wondering if there is any easy R commands or packages that will all allow me to easily add variables to data.frames which are the "difference" or change of over time of those variables.

If my data looked like this:

set.seed(1)
MyData <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))
MyData

   Day Price    Good
1    1     8  apples
2    2    10  apples
3    3     7  apples
4    4    11  apples
5    5    14  apples
6    1    12 oranges
7    2    11 oranges
8    3     9 oranges
9    4    14 oranges
10   5    11 oranges

Then after "first differencing" the price variable, my data would look like this.

   Day Price    Good P1d
1    1     8  apples  NA
2    2    10  apples   2
3    3     7  apples  -3
4    4    11  apples   4
5    5    14  apples   3
6    1    12 oranges  NA
7    2    11 oranges  -1
8    3     9 oranges  -2
9    4    14 oranges   5
10   5    11 oranges  -3
Était-ce utile?

La solution

ave

transform(MyData, P1d = ave(Price, Good, FUN = function(x) c(NA, diff(x))))

ave/gsubfn

The last solution can be shorteneed slightly using fn$ in the gsubfn package:

library(gsubfn)
transform(MyData, P1d = fn$ave(Price, Good, FUN = ~ c(NA, diff(x))))

dplyr

library(dplyr)

MyData %>% 
  group_by(Good) %>% 
  mutate(P1d = Price - lag(Price)) %>% 
  ungroup

data.table

library(data.table)

dt <- data.table(MyData)
dt[, P1d := c(NA, diff(Price)), by = Good]

Update

dplyr now uses %>% instead of %.% .

Autres conseils

One can easily do it like this:

library(reshape2)
library(dplyr)

MyNewData <- 
 MyData %.%
 melt(id = c("Good", "Day")) %.%
 dcast(Day ~ Good) %.%
 mutate(apples  = apples - lag(apples),
     oranges = oranges - lag(oranges)) %.%
 melt(id = "Day", variable.name = "Good", value.name = "P1d") %.%
 merge(MyData) %.%
 arrange(Good, Day)

Regards

In my case, I had to generate first differences for a panel. In order to have the differenced vector with the same length I used diff with a NA.

library(dplyr)
mydata %>%
group_by(id) %>%
mutate(price_diff = c(NA, diff(price)))%>%
ungroup()

This is what I came up with. But it does not seem efficient in the least:

MyData$P1d <- c(NA, MyData$Price[-1]-MyData$Price[-nrow(MyData)])
MyData$P1d[c(F,MyData$Good[-1]!=MyData$Good[-nrow(MyData)])] <- NA

MyData

   Day Price    Good P1d
1    1     8  apples  NA
2    2    10  apples   2
3    3     7  apples  -3
4    4    11  apples   4
5    5    14  apples   3
6    1    12 oranges  NA
7    2    11 oranges  -1
8    3     9 oranges  -2
9    4    14 oranges   5
10   5    11 oranges  -3

I came up with this code following some dpylr tutorials online: My goal was to add new column that is 5 year growth rate of R & D (variable xrd) of firms (identifer GVKEY). RandD2015 is raw data file.

By using the piping function (%>%) you can combine multiple calls in dplyr. Hope this is useful (my 1st code contribution in Stack Overflow)

library(dplyr)

RandDtest<- RandDec2015 %>% 
    group_by(GVKEY) %>%
    mutate(xrd5yr=xrd/lag(xrd,4)-1)

collapse::fdiff is the function you are looking for:

library(collapse)
# This means compute difference of Price lagged once, iterated once, by Good, ordered by Day
settransform(MyData, P1d = fdiff(Price, 1, 1, Good, Day))

If the data is unbalanced, examples such as askesis_rea's answer and G. Grothendieck's answer do not apply directly (Note: I did not test the other answers). But they apply after expanding the dataframe to a balanced panel with NA values.

Example

In this example, individuals are not observed across all time periods.

Let's extend the example to the unbalanced case: (I remove day 2 for apples)

set.seed(1)
mydata <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))
mydata <- mydata[!(mydata$Good=="apples" & mydata$Day==2), ]# removing apples in day 2
mydata
   Day Price    Good
1    1     8  apples
3    3     7  apples
4    4    11  apples
5    5    14  apples
6    1    12 oranges
7    2    11 oranges
8    3     9 oranges
9    4    14 oranges
10   5    11 oranges

Running G. Grothendieck's dplyr answer yields the wrong values:

mydata %>% 
  group_by(Good) %>% 
  mutate(P1d = Price - lag(Price)) %>% 
  ungroup

Indeed, for Day 3 and apples, the value should be 2 however it is -1. This is because the difference between prices in Day 3 and 1 was computed rather than the difference between Day 3 and 2.

# A tibble: 9 × 4
    Day Price Good      P1d
  <dbl> <int> <chr>   <int>
1     1     8 apples     NA
2     3     7 apples     -1
3     4    11 apples      4
4     5    14 apples      3
5     1    12 oranges    NA
6     2    11 oranges    -1
7     3     9 oranges    -2
8     4    14 oranges     5
9     5    11 oranges    -3

But if we expand first, and then apply first differencing, we get the right results:

library(tidyr)
expanded <- mydata %>%  complete(nesting(Good), Day=full_seq(Day, 1))

expanded %>% 
  group_by(Good) %>% 
  mutate(P1d = Price - lag(Price)) %>% 
  ungroup
# A tibble: 10 × 4
   Good      Day Price   P1d
   <chr>   <dbl> <int> <int>
 1 apples      1     8    NA
 2 apples      2    NA    NA
 3 apples      3     7    NA
 4 apples      4    11     4
 5 apples      5    14     3
 6 oranges     1    12    NA
 7 oranges     2    11    -1
 8 oranges     3     9    -2
 9 oranges     4    14     5
10 oranges     5    11    -3

Edit

In some other scenario of varying composition of individuals across time this method might not be appropriate. One good tool is fixest::d operator. It can be used in fixest model formula or with data.table.

Example with data.table

  1. Unbalanced
library(data.table)
# creating the data
set.seed(1)
mydata <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))
mydata <- mydata[!(mydata$Good=="apples" & mydata$Day==2), ]# removing apples in day 2

mydata <- fixest::panel(as.data.table(mydata), panel.id=~Good + Day)
mydata[, P1D:=fixest::d(Price)] # Adding inplace first difference 

as.data.frame(fixest::unpanel(mydata)) # viewing
  Day Price    Good P1D
1   1     8  apples  NA
2   3     7  apples  NA
3   4    11  apples   4
4   5    14  apples   3
5   1    12 oranges  NA
6   2    11 oranges  -1
7   3     9 oranges  -2
8   4    14 oranges   5
9   5    11 oranges  -3
  1. Balanced
set.seed(1)
MyData <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))

MyData <- fixest::panel(as.data.table(MyData), panel.id=~Good + Day)
MyData[, P1D:=fixest::d(Price)]
as.data.frame(fixest::unpanel(MyData))
[1] TRUE
   Day Price    Good P1D
1    1     8  apples  NA
2    2    10  apples   2
3    3     7  apples  -3
4    4    11  apples   4
5    5    14  apples   3
6    1    12 oranges  NA
7    2    11 oranges  -1
8    3     9 oranges  -2
9    4    14 oranges   5
10   5    11 oranges  -3
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top