RODBC read error where excel column contains leading NAs
Domanda
I have been reading Excel sheets into R
using the RODBC
package and have hit an issue with the Excel ODBC driver. Columns that contain (sufficient) leading NAs are coerced to logical.
In Excel the data appears as follows:
period n n.ft n.pt
1/02/1985 0.008 NA 0.025
1/03/1985 -0.003 NA -0.024
1/04/1985 0.002 NA 0.015
1/05/1985 0.006 NA 0.012
1/06/1985 0.001 NA 0.003
1/07/1985 0.005 NA 0.010
1/08/1985 0.006 NA 0.001
1/09/1985 0.007 NA 0.013
1/10/1985 -0.002 NA 0.009
1/11/1985 0.013 NA 0.019
1/12/1985 -0.004 NA -0.021
1/01/1986 0.008 NA 0.009
1/02/1986 0.002 NA 0.009
1/03/1986 0.002 -0.003 1.000
1/04/1986 0.010 -0.003 0.041
1/05/1986 0.000 -0.001 -0.004
1/06/1986 0.005 0.003 0.005
1/07/1986 -0.003 0.005 0.012
1/08/1986 -0.001 -0.003 -0.021
1/09/1986 0.003 -0.001 0.012
1/10/1986 0.003 0.003 0.010
1/11/1986 -0.003 0.003 -0.003
1/12/1986 0.003 -0.003 0.022
1/01/1987 0.001 0.013 -0.004
1/02/1987 0.004 -0.004 0.011
1/03/1987 0.004 0.008 0.005
1/04/1987 0.000 0.002 -0.002
1/05/1987 0.001 0.002 0.006
1/06/1987 0.004 0.010 0.00
I read in the data with:
require(RODBC)
conexcel <- odbcConnectExcel(xls.file="C:/data/example.xls")
s1 <- 'SOx'
dd <- sqlFetch(conexcel, s1)
odbcClose(conexcel)
This reads in the entire second column as NA
. I think this is due to the fact it's guessed to be logical, and therefore the subsequent numbers are assessed as invalid and hence NA
.
> str(dd)
'data.frame': 29 obs. of 4 variables:
$ period: POSIXct, format: "1985-02-01" "1985-03-01" ...
$ n : num 0.00833 -0.00338 0.00157 0.00562 0.00117 ...
$ n#ft : logi NA NA NA NA NA NA ...
$ n#pt : num 0.02515 -0.02394 0.0154 0.01224 0.00301 ...
I am trying to find a way to prevent this coercion to logical, which I think is causing the subsequent error.
I found this Q+A by searching SO, however I am at work and have no hope of being permitted to edit the registry to change the default for DWORD, as suggested (I understand that the value set here determines how many NAs are required before Microsoft guesses the data type and bombs my read).
Right now, I'm thinking that the best solution is to invert the data in Excel, and read it into R up-side-down.
I love a good hack but surely there's a better solution?
Soluzione
This is not a bug, but a feature of ODBC (note the lack of R) as documented here
http://support.microsoft.com/kb/257819/en-us
(long page, check for "mixed data type").
Since reading Excel files with ODBC is rather limited, I prefer one of the alternatives mentioned by Gabor, with preference for XLConnnect.