My question is about how to specify the class for various columns when reading in data that come from many files. More specifically, I am uploading 1000s of .xlsx files at a time and converting them to .csv files using the read.xls() function in the gdata package.

My approach is as follows:

Myfiles<-list.files() # lists all files in working directory (which contains data files)
library(gdata)
Mylist <- lapply(Myfiles, read.xls, header=T,
    perl="C:/Users/A/PERL/perl/bin/perl.exe",
    sheet=1,
    method="csv",
    skip=1,
    as.is=1)

I apologize for not providing a workable example. I'm not sure how to do so for this problem.

All the .xlsx files have identical headers and set-up, but the classes of corresponding columns in the data frames within Mylist are not all the same. Is there a way to specify the classes within the lapply() approach I am using? I know you can extend functions of read.table() to read.xls() but I haven't figured out how to specify the column classes properly within the lapply call.

有帮助吗?

解决方案

It's all in Gabor's comment, but to put this one to bed:

lapply(Myfiles, read.xls, colClasses = c("character", "numeric", "factor"), header=T)
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top