سؤال

Does anyone know why argument colClasses does not seem to work in read.xlsx?

I create a sample *.xlsx file:

> library(xlsx)
> d1 = data.frame(A=LETTERS[1:3], B=letters[1:3], C=1:3, D=c(1.1, NA, NA))
> str(d1)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: int  1 2 3
 $ D: num  1.1 NA NA
> write.xlsx(d1, 'test.xlsx', sheetName='Sheet1', row.names=F, showNA=F)

then try to read it with read.xlsx, without and with colClasses argument:

> d2 = read.xlsx('test.xlsx', sheetName='Sheet1')
> str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: num  1 2 3
 $ D: num  1.1 NA NA
> d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=c(B='character', 'A'='character'))
> str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: num  1 2 3
 $ D: num  1.1 NA NA

The problem is colClasses seems to have no effect. Any ideas?

Thank you for your help.

Aleksey

P.S. I have R 3.0.1, xlsx 0.5.1

هل كانت مفيدة؟

المحلول

colClasses= is working but the problem is that on your system default action when import data is to convert character columns to factor.

If you import test.xlsx and set that all columns should be "character", you see that all columns are made as factors (also numbers).

d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=rep("character",4))
 str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: Factor w/ 3 levels "1","2","3": 1 2 3
 $ D: Factor w/ 1 level "1.1": 1 NA NA

To ensure that characters are not converted to factors you can add argument stringsAsFactors=FALSE to function read.xlsx().

d2 = read.xlsx('test.xlsx', sheetName='Sheet1', 
                colClasses=c(B='character', A='character'),stringsAsFactors=FALSE)

str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: chr  "A" "B" "C"
 $ B: chr  "a" "b" "c"
 $ C: num  1 2 3
 $ D: num  1.1 NA NA
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top