Question

I am trying to remove rows that have duplicate entries, as defined by two columns, from multiple dataframes located in a single list.

Simple data:

aa <- data.frame(a=rnorm(100),b=rnorm(100),x=rnorm(100),y=rnorm(100),Z=rep(1:4, each=25))
split.aa<-split(aa, aa$Z)

For each df in the list 'split.aa' I am trying to remove rows with duplicated x,y pairs.

I could do this one df a time with:

split[[z]][!duplicated(split[[z]][,c('x','y')]),]

where z is the name of each df within 'split.aa'.

How would I write this into lapply so that the action is performed on each element?

I am having a hard time wrapping my head around how to refer to the specific list elements within the lapply function.

Was it helpful?

Solution

lapply(split.aa, function(x) x[!duplicated(x[c("x", "y")]), ])

will do the trick.

OTHER TIPS

just define a function in lapply

lapply(split.aa, function(x) x[!duplicated(x[c("x", "y")]), ])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top