Question

My question is about how to specify the class for various columns when reading in data that come from many files. More specifically, I am uploading 1000s of .xlsx files at a time and converting them to .csv files using the read.xls() function in the gdata package.

My approach is as follows:

Myfiles<-list.files() # lists all files in working directory (which contains data files)
library(gdata)
Mylist <- lapply(Myfiles, read.xls, header=T,
    perl="C:/Users/A/PERL/perl/bin/perl.exe",
    sheet=1,
    method="csv",
    skip=1,
    as.is=1)

I apologize for not providing a workable example. I'm not sure how to do so for this problem.

All the .xlsx files have identical headers and set-up, but the classes of corresponding columns in the data frames within Mylist are not all the same. Is there a way to specify the classes within the lapply() approach I am using? I know you can extend functions of read.table() to read.xls() but I haven't figured out how to specify the column classes properly within the lapply call.

Was it helpful?

Solution

It's all in Gabor's comment, but to put this one to bed:

lapply(Myfiles, read.xls, colClasses = c("character", "numeric", "factor"), header=T)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top