new overlapping variable
-
02-06-2021 - |
Question
I wasn't sure what to title this.
I have a dataset of people, years, and activities
df <- data.frame("id" = c("1", "1", "1", "2", "2","3"), "years" = rep(1971, 6),
"activity" = c("a","b","c","d","e","e"))
id years activity
1 1 1971 a
2 1 1971 b
3 1 1971 c
4 2 1971 d
5 2 1971 e
6 3 1971 e
I want to combine the years and activities columns, but for each year, in the original years column, I want to generate +/- 3 years, while retaining association with the id
If I did this in 2 steps: For id "1" the original year is 1971, so +/-3 years for ID 1 would result in:
id all_years
1 1968
1 1969
1 1970
1 1971
1 1972
1 1973
1 1974
In step 2, I want to combine this all_years column with the activities column from the original df, keeping the ids. So id "1" has 3 activities (a,b,c) and 7 years (1968:1964), so id "1" would appear 10 times in the new combined column.
So ultimately, I would end up with something like this:
id year_and_activities
1 a
1 b
1 c
1 1968
1 1969
1 1970
1 1971
1 1972
1 1973
1 1974
2 d
2 e
2 1968
...
2 1974
...
3 e
...
As always, Thank you!
Solution
I couldn't really follow your question, but given the initial data frame, you can get your final data frame using melt
:
require(reshape2)
##To get your +/- 3
dd = data.frame(id=df$id, activity=df$activity,
years=df$years- rep(-3:3, nrow(df)))
##Pretty much gives you what you want
df_melt = melt(dd, id=1)
##Remove the unnecessary column
df_melt = df_melt[,c(1,3)]
##Rename
colnames(df_melt) = c("id","year_and_activities")
##Order the column
df_melt[with(df_melt, order(id, year_and_activities)),]
As an aside, I would suggest that having a column as a mixture of "characters" and "years" is probably a bad idea - but you may have a good reason.