Using apply() over columns to output subsets

https://stackoverflow.com/questions/23227071

07-07-2023
|

Question

I have a data frame in R where the majority of columns are values, but there is one character column. For each column excluding the character column I want to subset the values that are over a threshold and obtain the corresponding value in the character column.

I'm unable to find a built-in dataset that contains the pattern of data I want, so a dput of my data can be accessed here.

When I use subsetting, I get the output I'm expecting:

> df[abs(df$PA3) > 0.32,1]
[1] "SSI_01" "SSI_02" "SSI_04" "SSI_05" "SSI_06" "SSI_07" "SSI_08" "SSI_09"

When I try to iterate over the columns of the data frame using apply, I get a recursion error:

> apply(df[2:10], 2, function(x) df[abs(df[[x]])>0.32, 1])
 Error in .subset2(x, i, exact = exact) : 
  recursive indexing failed at level 2

Any suggestions where I'm going wrong?

Solution

The reason your solution didn't work is that the x being passed to your user-defined function is actually a column of df. Therefore, you could get your solution working with a small modification (replacing df[[x]] with x):

apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])

You could use the ... argument to apply to pass an extra argument. In this case, you would want to pass the first column:

apply(df[2:10], 2, function(x, y) y[abs(x) > 0.32], y=df[,1])

OTHER TIPS

Yet another variation:

apply(abs(df[-1]) > .32, 2, subset, x=df[[1]])

The cute trick here is to "curry" subset by specifying the x parameter. I was hoping I could do it with [ but that doesn't deal with named parameters in the typical way because it is a primitive function :..(

A quick and non-sophisticated solution might be:

 sapply(2:10, function(x) df[abs(df[,x])>0.32, 1])

Try:

lapply(df[,2:10],function(x) df[abs(x)>0.32, 1])

Or using apply:

apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow