Add new factor across multiple groups
-
16-10-2019 - |
Question
I am trying to add multiple new rows for 3 new factor level in an existing data frame. Please refer to sample data for an example. My starting data frame has 18 levels for col1
and all 12 months for column mon
and past 20 years for year
. I then impute values and add new columns, however I need new factors to be added for further analysis.
For each mon
and year
combination, a new level should exist.
Sample df:
col1 <- c(rep("a",4),rep("b",4))
col2 <- c(1:4)
mon <- c(rep(c("Jan","Feb", "Mar","Apr"), 4))
year <- c(rep("2016",8), rep("2015",8))
df <- as.data.frame(cbind(col1,col2,mon,year))
head(df,8) # edited to make it readable
col1 col2 mon year
1 a 1 Jan 2016
2 a 2 Feb 2016
3 a 3 Mar 2016
4 a 4 Apr 2016
5 b 1 Jan 2016
6 b 2 Feb 2016
7 b 3 Mar 2016
8 b 4 Apr 2016
Expected Output
col1 col2 mon year
1 a 1 Jan 2016
2 a 2 Feb 2016
3 a 3 Mar 2016
4 a 4 Apr 2016
5 b 1 Jan 2016
6 b 2 Feb 2016
7 b 3 Mar 2016
8 b 4 Apr 2016
9 c NA Jan 2016 # New level c for each mon and year
10 c NA Feb 2016 # New level c for each mon and year
11 c NA Mar 2016 # New level c for each mon and year
12 c NA Apr 2016 # New level c for each mon and year
How do I go about reaching the expected df?
Solution
Several possibilities. For example, to add c
for existing mon
-year
combinations in your data frame:
rbind(df, transform(df[!duplicated(df[, 3:4]), ], col1="c", col2=NA))
# col1 col2 mon year
# 1 a 1 Jan 2016
# 2 a 2 Feb 2016
# 3 a 3 Mar 2016
# 4 a 4 Apr 2016
# 5 b 1 Jan 2016
# 6 b 2 Feb 2016
# 7 b 3 Mar 2016
# 8 b 4 Apr 2016
# 9 a 1 Jan 2015
# 10 a 2 Feb 2015
# 11 a 3 Mar 2015
# 12 a 4 Apr 2015
# 13 b 1 Jan 2015
# 14 b 2 Feb 2015
# 15 b 3 Mar 2015
# 16 b 4 Apr 2015
# 17 c <NA> Jan 2016
# 21 c <NA> Feb 2016
# 31 c <NA> Mar 2016
# 41 c <NA> Apr 2016
# 91 c <NA> Jan 2015
# 101 c <NA> Feb 2015
# 111 c <NA> Mar 2015
# 121 c <NA> Apr 2015
To add c
for all possible combinations of existing mon
values and existing year
values:
rbind(df, data.frame(col1="c", col2=NA, expand.grid(mon=levels(df$mon), year=levels(df$year))))
To add c
for all possible combinations of all months names and existing year
values:
rbind(df, data.frame(col1="c", col2=NA, expand.grid(mon=month.abb, year=levels(df$year))))
and so on.