Question

This is how my time-series, cross-sectional data is structured:

country     year group  change
Afghanistan 1980   1      0 
Afghanistan 1981   1      0 
Afghanistan 1982   1      1 
Afghanistan 1983   1      0 
Afghanistan 1984   1      0 
Afghanistan 1985   1      1 
Afghanistan 1986   1      0 
Afghanistan 1987   1      2 
Afghanistan 1988   1      0 
Bhutan      1980   2      0 
Bhutan      1981   2      0 
Bhutan      1982   2      0 
Bhutan      1983   2      0 
Bhutan      1984   2      1 
Bhutan      1985   2      0 
Bhutan      1986   2      0 
Bhutan      1987   2      0 
Bhutan      1988   2      2 
Chile       1980   3      0

The variable change is "1" if there was a positive change and "2" if there was a negative change.

PROBLEM

I am struggling with creating two new variables:

(1) A variable called "trend"

In lay terms this variable should stand for "For each group (country-year), trend = 1 if change = 1 but only until change = 2".

(2) A variable called "time"

This variable should specify the years before and after a positive trend (change =1 ).

That is, in the end, the data set should look like:

country     year group  change  trend  time
Afghanistan 1980   1      0      0      -2
Afghanistan 1981   1      0      0      -1
Afghanistan 1982   1      1      1       1
Afghanistan 1983   1      0      1       2
Afghanistan 1984   1      0      1       3
Afghanistan 1985   1      1      1       4
Afghanistan 1986   1      0      1       5
Afghanistan 1987   1      2      0       0
Afghanistan 1988   1      0      0       0
Bhutan      1980   2      0      0      -4
Bhutan      1981   2      0      0      -3
Bhutan      1982   2      0      0      -2
Bhutan      1983   2      0      0      -1
Bhutan      1984   2      1      1       1
Bhutan      1985   2      0      1       2
Bhutan      1986   2      0      1       3
Bhutan      1987   2      0      1       4
Bhutan      1988   2      2      0       0
Chile       1980   3      0      0       0

I think to separate the groups one could use "split", e.g.

data$trend <- split(data$group, data$group)  # separate by unique values
[...]
data$trend <- unsplit(data$trend, data$group)  # make back into a vector

BUT: What would be the command between these two lines?

This line would generate a sequence

data.time$trend <- lapply(data.time$trend, seq)

BUT: How to limit it to the positive trend, i.e. data$trend==1?

Any ideas more than welcome! Many thanks.

Was it helpful?

Solution 2

Here is an alternative solution using ddply (assuming your df is named mydata):

changeTime <- function(x) {        # time function

    if (max(x)==0) return(0)       # checking for empty events

    y <- (1:length(x)-match(1,x))  # pre-constructing time
    y[y>=0] <- y[y>=0]+1           # adding extra 1

    if (!is.na(match(2,x))) {
      y[match(2,x):length(x)] <- 0 # setting 0 after 2
    }
    return(y)
}

changeTrend <- function(x) {       # trend function

    y <- cummax(x)   # using cumulative maximum function
    y[y>=2] <- 0     # remove trailing 2's
    return(y)

}

require(plyr)
ddply(mydata,.(country),mutate,trend=changeTrend(change),time=changeTime(change))

P.S. I would imagine that time at the event itself should be 0, not 1. If that is the case then the line adding extra 1s in the first function should be removed.

OTHER TIPS

Something like below will do. Key is obviously to write proper myFunc.

DF
##        country year group change
## 1  Afghanistan 1980     1      0
## 2  Afghanistan 1981     1      0
## 3  Afghanistan 1982     1      1
## 4  Afghanistan 1983     1      0
## 5  Afghanistan 1984     1      0
## 6  Afghanistan 1985     1      1
## 7  Afghanistan 1986     1      0
## 8  Afghanistan 1987     1      2
## 9  Afghanistan 1988     1      0
## 10      Bhutan 1980     2      0
## 11      Bhutan 1981     2      0
## 12      Bhutan 1982     2      0
## 13      Bhutan 1983     2      0
## 14      Bhutan 1984     2      1
## 15      Bhutan 1985     2      0
## 16      Bhutan 1986     2      0
## 17      Bhutan 1987     2      0
## 18      Bhutan 1988     2      2


myFunc <- function(x) {
    trend <- rep(0, nrow(x))

    trendStart <- which(x$change == 1)[1]
    trendEnd <- which(x$change == 2)[1] - 1

    trend[seq(from = trendStart, to = trendEnd)] <- 1

    time <- c(seq(from = 1 - trendStart, to = -1), seq(from = 1, to = trendEnd + 1 - trendStart), rep(0, nrow(x) - trendEnd))

    return(cbind(x, trend, time))

}

LL <- split(DF, DF$group)

do.call(rbind, lapply(LL, myFunc))
##          country year group change trend time
## 1.1  Afghanistan 1980     1      0     0   -2
## 1.2  Afghanistan 1981     1      0     0   -1
## 1.3  Afghanistan 1982     1      1     1    1
## 1.4  Afghanistan 1983     1      0     1    2
## 1.5  Afghanistan 1984     1      0     1    3
## 1.6  Afghanistan 1985     1      1     1    4
## 1.7  Afghanistan 1986     1      0     1    5
## 1.8  Afghanistan 1987     1      2     0    0
## 1.9  Afghanistan 1988     1      0     0    0
## 2.10      Bhutan 1980     2      0     0   -4
## 2.11      Bhutan 1981     2      0     0   -3
## 2.12      Bhutan 1982     2      0     0   -2
## 2.13      Bhutan 1983     2      0     0   -1
## 2.14      Bhutan 1984     2      1     1    1
## 2.15      Bhutan 1985     2      0     1    2
## 2.16      Bhutan 1986     2      0     1    3
## 2.17      Bhutan 1987     2      0     1    4
## 2.18      Bhutan 1988     2      2     0    0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top