How to grab each column's unique values in multiple csv files

https://stackoverflow.com/questions/23687075

23-07-2023
|

Question

I am relatively new to R so bear with me. I have 50+ csv files and am looking to run through each of them and grab each column's unique values. They are all formatted with first row being the headers.

The ideal output would then be a data frame indicating filename, column headers, and unique values for each csv. These are unique values for each column, one at a time, not for any uniqueness across a combination of columns.

Any help would be greatly appreciated!

Here is how I'm getting unique values as a list, but I'm not sure what to do next:

lapply(files, function(x) {
  t <- read.csv(x, header=TRUE) # load file
  unq <- apply(t, 2, unique)
})

Solution

This should do the trick:

do.call(rbind, lapply(files, function(x) {
  dat <- read.csv(x, header=TRUE)
  do.call(rbind, lapply(seq(ncol(dat)), function(idx) {
    data.frame(filename=x, column=colnames(dat)[idx],
               value=unique(dat[,idx]))
  }))
}))

The outer lapply returns a data frame for each of your files x, and the inner lapply returns a data frame for each column numbered idx within x.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow