I was wondering if you could help me devise an effortless way to code this country-year event data that I'm using.

In the example below, each row corresponds with an ongoing event (that I will eventually fold into a broader panel data set, which is why it looks bare now). So, for example, country 29 had the onset of an event in 1920, which continued (and ended) in 1921. Country 23 had the onset of the event in 1921, which lasted until 1923. Country 35 had the onset of an event that occurred in 1921 and only in 1921, et cetera.

country     year
  29        1920
  29        1921
  23        1921
  23        1922
  23        1923
  35        1921
  64        1926
  135       1928
  135       1929
  135       1930
  135       1931
  135       1932
  135       1933
  135       1934
  120       1930
  70        1932

What I want to do is create "onset" and "ongoing" variables. The "ongoing" variable in this sample data frame would be easy. Basically: Data$ongoing <- 1

I'm more interested in creating the "onset" variable. It would be coded as 1 if it marks the onset of the event for the given country. Basically, I want to create a variable that looks like this, given this example data.

country     year     onset
  29        1920       1
  29        1921       0  
  23        1921       1
  23        1922       0
  23        1923       0
  35        1921       1
  64        1926       1
  135       1928       1
  135       1929       0
  135       1930       0
  135       1931       0
  135       1932       0
  135       1933       0
  135       1934       0
  120       1930       1
  70        1932       1

If you can think of effortless ways to do this in R (that minimizes the chances of human error when working with it in a spreadsheet program like Excel), I'd appreciate it. I did see this related question, but this person's data set doesn't look like mine and it may require a different approach.

Thanks. Reproducible code for this example data is below.

country <- c(29,29,23,23,23,36,64,135,135,135,135,135,135,135,120,70)
year <- c(1920,1921,1921,1922,1923,1921,1926,1928,1929,1930,1931,1932,1933,1934,1930,1932)

Data=data.frame(country=country,year=year)
summary(Data)
Data
有帮助吗?

解决方案

This should work, even with multiple onsets per country:

Data$onset <- with(Data, ave(year, country, FUN = function(x)
                 as.integer(c(TRUE, tail(x, -1L) != head(x, -1L) + 1L))))

其他提示

You could also do this:

library(data.table)  
setDT(Data)[, onset := (min(country*year)/country  == year) + 0L, country]

This could be very fast when you have a larger dataset.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top