Basic lag in R vector/dataframe

https://stackoverflow.com/questions/3558988

01-10-2019
|

Question

Will most likely expose that I am new to R, but in SPSS, running lags is very easy. Obviously this is user error, but what I am missing?

x <- sample(c(1:9), 10, replace = T)
y <- lag(x, 1)
ds <- cbind(x, y)
ds

Results in:

      x y
 [1,] 4 4
 [2,] 6 6
 [3,] 3 3
 [4,] 4 4
 [5,] 3 3
 [6,] 5 5
 [7,] 8 8
 [8,] 9 9
 [9,] 3 3
[10,] 7 7

I figured I would see:

     x y
 [1,] 4 
 [2,] 6 4
 [3,] 3 6
 [4,] 4 3
 [5,] 3 4
 [6,] 5 3
 [7,] 8 5
 [8,] 9 8
 [9,] 3 9
[10,] 7 3

Any guidance will be much appreciated.

Solution

Another way to deal with this is using the zoo package, which has a lag method that will pad the result with NA:

require(zoo)
> set.seed(123)
> x <- zoo(sample(c(1:9), 10, replace = T))
> y <- lag(x, -1, na.pad = TRUE)
> cbind(x, y)
   x  y
1  3 NA
2  8  3
3  4  8
4  8  4
5  9  8
6  1  9
7  5  1
8  9  5
9  5  9
10 5  5

The result is a multivariate zoo object (which is an enhanced matrix), but easily converted to a data.frame via

> data.frame(cbind(x, y))

OTHER TIPS

I had the same problem, but I didn't want to use zoo or xts, so I wrote a simple lag function for data frames:

lagpad <- function(x, k) {
  if (k>0) {
    return (c(rep(NA, k), x)[1 : length(x)] );
  }
  else {
    return (c(x[(-k+1) : length(x)], rep(NA, -k)));
  }
}

This can lag forward or backwards:

x<-1:3;
(cbind(x, lagpad(x, 1), lagpad(x,-1)))
     x      
[1,] 1 NA  2
[2,] 2  1  3
[3,] 3  2 NA

lag does not shift the data, it only shifts the "time-base". x has no "time base", so cbind does not work as you expected. Try cbind(as.ts(x),lag(x)) and notice that a "lag" of 1 shifts the periods forward.

I would suggesting using zoo / xts for time series. The zoo vignettes are particularly helpful.

lag() works with time series, whereas you are trying to use bare matrices. This old question suggests using embed instead, like so:

lagmatrix <- function(x,max.lag) embed(c(rep(NA,max.lag), x), max.lag+1)

for instance

> x
[1] 8 2 3 9 8 5 6 8 5 8
> lagmatrix(x, 1)
      [,1] [,2]
 [1,]    8   NA
 [2,]    2    8
 [3,]    3    2
 [4,]    9    3
 [5,]    8    9
 [6,]    5    8
 [7,]    6    5
 [8,]    8    6
 [9,]    5    8
[10,]    8    5

Using just standard R functions this can be achieved in a much simpler way:

x <- sample(c(1:9), 10, replace = T)
y <- c(NA, head(x, -1))
ds <- cbind(x, y)
ds

The easiest way to me now appears to be the following:

require(dplyr)
df <- data.frame(x = sample(c(1:9), 10, replace = T))
df <- df %>% mutate(y = lag(x))

tmp<-rnorm(10)
tmp2<-c(NA,tmp[1:length(tmp)-1])
tmp
tmp2

This should accommodate vectors or matrices as well as negative lags:

lagpad <- function(x, k=1) {
  i<-is.vector(x)
  if(is.vector(x)) x<-matrix(x) else x<-matrix(x,nrow(x))
  if(k>0) {
      x <- rbind(matrix(rep(NA, k*ncol(x)),ncol=ncol(x)), matrix(x[1:(nrow(x)-k),], ncol=ncol(x)))
  }
  else {
      x <- rbind(matrix(x[(-k+1):(nrow(x)),], ncol=ncol(x)),matrix(rep(NA, -k*ncol(x)),ncol=ncol(x)))
  }
  if(i) x[1:length(x)] else x
}

a simple way to do the same may be copying the data to a new data frame and changing the index number. Make sure the original table is indexed sequentially with no gaps

e.g.

tempData <- originalData
rownames(tempData) <- 2:(nrow(tempData)+1)

if you want it in the same data frame as the original use a cbind function

Two options, in base R and with data.table:

baseShiftBy1 <- function(x) c(NA, x[-length(x)])
baseShiftBy1(x)
[1] NA  3  8  4  8  9  1  5  9  5

data.table::shift(x)
[1] NA  3  8  4  8  9  1  5  9  5

Data:

set.seed(123)
(x <- sample(c(1:9), 10, replace = T))
[1] 3 8 4 8 9 1 5 9 5 5

Just get rid of lag. Change your line for y to:

y <- c(NA, x[-1])

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow