How to group by unique values in a list in R

https://stackoverflow.com/questions/23595024

20-07-2023
|

문제

My data frame has a variable of class list (the str output gives: $ X2 :List of 125). I would like to group by unique values in this list to preform some aggregate functions, but when I use group_by in dplyr I get:

Error in eval(expr, envir, enclos) : 
  cannot group column X2, of class 'list':

A.) Is there a way to group by unique values in a list? Either using dplyr or some other grouping function? B.) Is there a way to convert the list variable to a factor variable with levels? I have no need for the variable X2 to be a list, thats just how the values were generated. But I do need to be able to group_by unique values.

The data frame I am using has the following structure:

    'data.frame':   125 obs. of  5 variables:
 $ MOV  : int  -69 -68 -67 -63 -62 -60 -59 -56 -55 -54 ...
 $ X    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Count: int  1 1 1 1 2 1 1 1 2 1 ...
 $ Perc : num  0.000179 0.000179 0.000179 0.000179 0.000358 ...
 $ X2   :List of 125

Any and all help would be appreciated.

Edit: Here is the dput output:

structure(list(MOV = c(-69L, -68L, -67L, -63L, -62L, -60L), X = 1:6, 
    Count = c(1L, 1L, 1L, 1L, 2L, 1L), Perc = c(0.000178922884236894, 
    0.000178922884236894, 0.000178922884236894, 0.000178922884236894, 
    0.000357845768473788, 0.000178922884236894), X2 = structure(list(
        range = "[ -69 , -35 )", range = "[ -69 , -35 )", range = "[ -69 , -35 )", 
        range = "[ -69 , -35 )", range = "[ -69 , -35 )", range = "[ -69 , -35 )"), .Names = c("range", 
    "range", "range", "range", "range", "range"))), .Names = c("MOV", 
"X", "Count", "Perc", "X2"), row.names = c(NA, 6L), class = "data.frame")

해결책

As you already found out, it is not possible to group by X2 as a list using dplyr. So one way you could try is to convert to factor and then group by X2.

If your data.frame is called df try the following:

df$X2 <- as.factor(unlist(df$X2))

Afterwards you can use dplyr to group by any variable including X2

다른 팁

The following code would make it, especially even in cases where some elements in your list column have a length higher than 2. However it is not efficient: if you have both many rows in your data frame and many unique values in your list df$X2, it may take hours.

First create a list of only unique elements of your list of interest

ulist <- unique(df$X2)

Then, for each unique element, identify which rows of your data has the X2 element matching this unique element, and for these rows, give a common index (column id)

df$id<- rep(NA,nrow(df))
for(i in 1:length(ulist)){
  
  df$id[df$X2 %in% ulist[i]] <- i
  
}

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow