Вопрос

I would like to create a table out of another table (data.table) that has additional rows based on a condition. Lets say in the following table, I want to create an additional row if length(indicator)>2. The result should be the table below.

The source table looks like this:

    id  indicator
1   123 abc
2   456 NA
3   456 NA
4   456 NA
5   123 abcd
6   789 abc
dt1 <- data.table(id=c(123, 456, 456, 456, 123, 789), indicator = c("abc", NA, NA, NA, "abcd", "abc"))

Resulting table should look like this:

    id  indicator
1   123 abc
2   123 abc2
3   456 NA
4   456 NA
5   456 NA
6   123 abcd
7   123 abcd2
8   789 abc
9   789 abc2
dt2 <- data.table(id=c(123,123, 456, 456, 456, 123,123,789, 789), indicator = c("abc", "abc2", NA, NA, NA, "abcd", "abcd2", "abc", "abc2"))
Это было полезно?

Решение

EDIT: cleaner version courtesy Arun (note there is a key argument added to the data.table creation):

dt1 <- data.table(
  id=c(123, 456, 456, 456, 123, 789), 
  indicator = c("abc", NA, NA, NA, "abcd", "abc"), 
  key=c("id", "indicator")
)                    
dt1[, 
  list(indicator=
    if(nchar(indicator) > 2)
      paste0(indicator, c("", 2:(max(2, .N))))
    else 
      rep(indicator, .N)
    ),
  by=list(indicator, id)
][, -1]
#     id indicator
# 1: 123       abc
# 2: 123      abc2
# 3: 123      abcd
# 4: 123     abcd2
# 5: 456        NA
# 6: 456        NA
# 7: 456        NA
# 8: 789       abc
# 9: 789      abc2                    

Old version

There probably is a more elegant way, but this will do it. Basically, you rbind the rows that don't meet your condition, with those that do, modified by appending the numeric modifier (or "" for the first one). Note, if you have non-unique id/indicators, this will just add another numeric modifier (i.e. 123-abc, 123-abc, ends up as 123-abc, 123-abc2, 123-abc3).

dt1 <- data.table(id=c(123, 456, 456, 456, 123, 789), indicator = c("abc", NA, NA, NA, "abcd", "abc"))                    
rbind(
  dt1[nchar(indicator) <= 2 | is.na(indicator)],
  dt1[
    nchar(indicator) > 2, 
    list(indicator=paste0(indicator, c("", 2:(max(2, .N))))), 
    by=list(indicator, id)
  ][, -1]
)[order(id, indicator)]
#     id indicator
# 1: 123       abc
# 2: 123      abc2
# 3: 123      abcd
# 4: 123     abcd2
# 5: 456        NA
# 6: 456        NA
# 7: 456        NA
# 8: 789       abc
# 9: 789      abc2                    
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top