Question

With this question I would like to extend and generalize the discussion started here. This is for the benefit of those, like me, who are still in trouble when have to use lapply.

Suppose I have the data frames d1 and d2 which I store in the list my.ls

d1<-data.frame(a=rnorm(5), b=c(rep(2006, times=4),NA), c=letters[1:5])
d2<-data.frame(a=1:5, b=c(2007, 2007, NA, NA, 2007), c=letters[6:10])
my.ls<-list(d1=d1, d2=d2)

How can I obtain another list featuring the same data frames for which I keep only the first and third columns? I tried the following, but it didn't work

my.ls.sub<-lapply(my.ls, my.ls[,c(1,3)])

What if then, I not only want to subset the data frames, but I also want to know what are the unique values in the columns I am extracting? (In other words, here I would create two vectors for every data frame which could be free or stored in a list of lists). For the latter point I am not able to suggest anything...

Was it helpful?

Solution

Try this

lapply(my.ls, "[", ,c(1,3))

Or editing a little bit your code yields:

lapply(my.ls, function(x) x[, c(1,3)])

Edit

Since @Matthew Plourde already answered the second part of your question using lapply, then I give you an alternative way to do it using rapply which is the recursive version of lapply.

rapply(lapply(my.ls, "[", ,c(1,3)), unique, how="list")

OTHER TIPS

You were close: lapply(my.ls, '[', c(1,3)). This calls the indexing function [ on each data.frame with the additional argument c(1,3), specifying the first and third column.

Equivalently, you could call lapply(my.ls, '[', -2) to remove the second column.

But I would recommend the more intelligible lapply(my.ls, subset, select=c(1,3)).

To go directly from your original list to the a list of which values are unique in each column of each data.frame, you could use nested lapply statements like so:

lapply(my.ls, function(d) lapply(d[c(1,3)], unique))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top