How to group by unique values in a list in R

https://stackoverflow.com/questions/23595024

20-07-2023
|

Question

My data frame has a variable of class list (the str output gives: $ X2 :List of 125). I would like to group by unique values in this list to preform some aggregate functions, but when I use group_by in dplyr I get:

Error in eval(expr, envir, enclos) : 
  cannot group column X2, of class 'list':

A.) Is there a way to group by unique values in a list? Either using dplyr or some other grouping function? B.) Is there a way to convert the list variable to a factor variable with levels? I have no need for the variable X2 to be a list, thats just how the values were generated. But I do need to be able to group_by unique values.

The data frame I am using has the following structure:

    'data.frame':   125 obs. of  5 variables:
 $ MOV  : int  -69 -68 -67 -63 -62 -60 -59 -56 -55 -54 ...
 $ X    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Count: int  1 1 1 1 2 1 1 1 2 1 ...
 $ Perc : num  0.000179 0.000179 0.000179 0.000179 0.000358 ...
 $ X2   :List of 125

Any and all help would be appreciated.

Edit: Here is the dput output:

structure(list(MOV = c(-69L, -68L, -67L, -63L, -62L, -60L), X = 1:6, 
    Count = c(1L, 1L, 1L, 1L, 2L, 1L), Perc = c(0.000178922884236894, 
    0.000178922884236894, 0.000178922884236894, 0.000178922884236894, 
    0.000357845768473788, 0.000178922884236894), X2 = structure(list(
        range = "[ -69 , -35 )", range = "[ -69 , -35 )", range = "[ -69 , -35 )", 
        range = "[ -69 , -35 )", range = "[ -69 , -35 )", range = "[ -69 , -35 )"), .Names = c("range", 
    "range", "range", "range", "range", "range"))), .Names = c("MOV", 
"X", "Count", "Perc", "X2"), row.names = c(NA, 6L), class = "data.frame")

Solution

As you already found out, it is not possible to group by X2 as a list using dplyr. So one way you could try is to convert to factor and then group by X2.

If your data.frame is called df try the following:

df$X2 <- as.factor(unlist(df$X2))

Afterwards you can use dplyr to group by any variable including X2

OTHER TIPS

The following code would make it, especially even in cases where some elements in your list column have a length higher than 2. However it is not efficient: if you have both many rows in your data frame and many unique values in your list df$X2, it may take hours.

First create a list of only unique elements of your list of interest

ulist <- unique(df$X2)

Then, for each unique element, identify which rows of your data has the X2 element matching this unique element, and for these rows, give a common index (column id)

df$id<- rep(NA,nrow(df))
for(i in 1:length(ulist)){
  
  df$id[df$X2 %in% ulist[i]] <- i
  
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow