Indexing data.frames using arrays of TRUEs and FALSEs
-
24-09-2019 - |
Question
I'm having some trouble indexing data.frames in R. I'm an R beginner. I have a data.frame
called d
which has 35512 columns and 77 rows. I have a list called rd
which contains 35512 elements. I'd like all the columns of d
which correspond to the items in rd
less than 100. Here's what I'm doing:
# just to prove I'm not crazy
> length(colnames(d))
[1] 35512
> length(rownames(d))
[1] 77
> length(rd)
[1] 35512
# find all the elements of rd less than 100 (+ unnecessary faffing?)
> i <- unlist(rd<100)
> names(i) <- NULL
# try to extract all the elements of d corresponding to rd < 100
> d <- d[,i]
Error in `[.data.frame`(d, , i) : undefined columns selected
I don't really want to be doing the unlist
and names(i) <- NULL
stuff but I'm getting seriously paranoid. Can anyone help with what the hell this error message means?
In case it helps, the rd
variable is created using the following:
rd = lapply(lapply(d, range), diff)
Which hopefully tells me the difference in the range of each column of d
.
P.S. bonus awesomeness for anyone who can tell me a command to find the shape of a data.frame other than querying the length of its row and column names.
Edit: Here's what rd
looks like:
> rd[1:3]
$`10338001`
[1] 7198.886
$`10338003`
[1] 4748.963
$`10338004`
[1] 3173.046
and when I've done my faffing, i
looks like this:
> i[7:10]
[1] FALSE FALSE FALSE TRUE
Solution
Have you tried this:
d[,rd < 100]
Here's a self-contained example:
d <- data.frame(matrix(1:100, ncol=10))
rd <- as.list(1:10)
d[,rd < 5]
To get the shape of a dataframe, use nrow
and ncol
.
Edit:
Based on your response to my NA
question, it sounds like you have non-logical values in your index that result from missing values in your list. The best thing to do is to first decide how you want to treat a missing value. Then deal with them using the is.na
function (here I extend my example from above):
rd[[3]] <- NA
d[,rd < 5]
# => Error in `[.data.frame`(d, , rd < 5) : undefined columns selected
To deal with this, I will set that NA value to 0 (which means that it the respective column will be included in the final data.frame):
rd[is.na(rd)] <- 0
d[,rd < 5]
You need to decide for yourself what to do with the NA
values.
OTHER TIPS
For the bonus Q, you get "shape" of a data frame or matrix using the "dim" command.
A = matrix( ceiling(10*runif(40)), nrow=8)
colnames(A) = c("col1", "col2", "col3", "col4", "col5")
df = data.frame(A)
b = ceiling(100*runif(5))
ndx = b < 50
result = df[,ndx] # just the columns of df corresponding to b < 50