This should work, even with multiple onsets per country:
Data$onset <- with(Data, ave(year, country, FUN = function(x)
as.integer(c(TRUE, tail(x, -1L) != head(x, -1L) + 1L))))
题
I was wondering if you could help me devise an effortless way to code this country-year event data that I'm using.
In the example below, each row corresponds with an ongoing event (that I will eventually fold into a broader panel data set, which is why it looks bare now). So, for example, country 29 had the onset of an event in 1920, which continued (and ended) in 1921. Country 23 had the onset of the event in 1921, which lasted until 1923. Country 35 had the onset of an event that occurred in 1921 and only in 1921, et cetera.
country year
29 1920
29 1921
23 1921
23 1922
23 1923
35 1921
64 1926
135 1928
135 1929
135 1930
135 1931
135 1932
135 1933
135 1934
120 1930
70 1932
What I want to do is create "onset" and "ongoing" variables. The "ongoing" variable in this sample data frame would be easy. Basically: Data$ongoing <- 1
I'm more interested in creating the "onset" variable. It would be coded as 1 if it marks the onset of the event for the given country. Basically, I want to create a variable that looks like this, given this example data.
country year onset
29 1920 1
29 1921 0
23 1921 1
23 1922 0
23 1923 0
35 1921 1
64 1926 1
135 1928 1
135 1929 0
135 1930 0
135 1931 0
135 1932 0
135 1933 0
135 1934 0
120 1930 1
70 1932 1
If you can think of effortless ways to do this in R (that minimizes the chances of human error when working with it in a spreadsheet program like Excel), I'd appreciate it. I did see this related question, but this person's data set doesn't look like mine and it may require a different approach.
Thanks. Reproducible code for this example data is below.
country <- c(29,29,23,23,23,36,64,135,135,135,135,135,135,135,120,70)
year <- c(1920,1921,1921,1922,1923,1921,1926,1928,1929,1930,1931,1932,1933,1934,1930,1932)
Data=data.frame(country=country,year=year)
summary(Data)
Data
解决方案
This should work, even with multiple onsets per country:
Data$onset <- with(Data, ave(year, country, FUN = function(x)
as.integer(c(TRUE, tail(x, -1L) != head(x, -1L) + 1L))))
其他提示
You could also do this:
library(data.table)
setDT(Data)[, onset := (min(country*year)/country == year) + 0L, country]
This could be very fast when you have a larger dataset.