Question

Does anyone know why argument colClasses does not seem to work in read.xlsx?

I create a sample *.xlsx file:

> library(xlsx)
> d1 = data.frame(A=LETTERS[1:3], B=letters[1:3], C=1:3, D=c(1.1, NA, NA))
> str(d1)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: int  1 2 3
 $ D: num  1.1 NA NA
> write.xlsx(d1, 'test.xlsx', sheetName='Sheet1', row.names=F, showNA=F)

then try to read it with read.xlsx, without and with colClasses argument:

> d2 = read.xlsx('test.xlsx', sheetName='Sheet1')
> str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: num  1 2 3
 $ D: num  1.1 NA NA
> d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=c(B='character', 'A'='character'))
> str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: num  1 2 3
 $ D: num  1.1 NA NA

The problem is colClasses seems to have no effect. Any ideas?

Thank you for your help.

Aleksey

P.S. I have R 3.0.1, xlsx 0.5.1

Was it helpful?

Solution

colClasses= is working but the problem is that on your system default action when import data is to convert character columns to factor.

If you import test.xlsx and set that all columns should be "character", you see that all columns are made as factors (also numbers).

d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=rep("character",4))
 str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: Factor w/ 3 levels "1","2","3": 1 2 3
 $ D: Factor w/ 1 level "1.1": 1 NA NA

To ensure that characters are not converted to factors you can add argument stringsAsFactors=FALSE to function read.xlsx().

d2 = read.xlsx('test.xlsx', sheetName='Sheet1', 
                colClasses=c(B='character', A='character'),stringsAsFactors=FALSE)

str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: chr  "A" "B" "C"
 $ B: chr  "a" "b" "c"
 $ C: num  1 2 3
 $ D: num  1.1 NA NA
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top