Number of Unique Rows in Subgroup

https://stackoverflow.com/questions/22853119

r
data.table

27-06-2023
|

Question

I have a data.table x like this:

> data = data.table(a = c(1,2,3,4), x=c(1,1,0,0), y=c(1,1,0,1), id = c(22,22,33,33))
> data
   a x y id
1: 1 1 1 22
2: 2 1 1 22
3: 3 0 0 33
4: 4 0 1 33

I want to group by id and count the number of unique rows formed by x and y for each id. Column a is irrelevant. So the output I want is something like this:

   id count
1: 22 1
4: 33 2

I tried this but I get an error:

> data[, unique(list(x,y)),by='id']
Error in `[.data.table`(data, , unique(list(x, y)), by = "id") : 
  j doesn't evaluate to the same number of columns for each group

Solution

You're just missing length:

data[, length(unique(list(x,y))), by = 'id']
#    id V1
# 1: 22  1
# 2: 33  2

Regarding your question in the comment, this seems to work:

data[, length(unique(as.list(.SD))), by = "id", .SDcols = 2:3]
#    id V1
# 1: 22  1
# 2: 33  2

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow