RODBC read error where excel column contains leading NAs

https://stackoverflow.com/questions/12152412

28-06-2021
|

Domanda

I have been reading Excel sheets into R using the RODBC package and have hit an issue with the Excel ODBC driver. Columns that contain (sufficient) leading NAs are coerced to logical.

In Excel the data appears as follows:

period      n       n.ft n.pt
1/02/1985   0.008   NA  0.025
1/03/1985   -0.003  NA  -0.024
1/04/1985   0.002   NA  0.015
1/05/1985   0.006   NA  0.012
1/06/1985   0.001   NA  0.003
1/07/1985   0.005   NA  0.010
1/08/1985   0.006   NA  0.001
1/09/1985   0.007   NA  0.013
1/10/1985   -0.002  NA  0.009
1/11/1985   0.013   NA  0.019
1/12/1985   -0.004  NA  -0.021
1/01/1986   0.008   NA  0.009
1/02/1986   0.002   NA  0.009
1/03/1986   0.002   -0.003  1.000
1/04/1986   0.010   -0.003  0.041
1/05/1986   0.000   -0.001  -0.004
1/06/1986   0.005   0.003   0.005
1/07/1986   -0.003  0.005   0.012
1/08/1986   -0.001  -0.003  -0.021
1/09/1986   0.003   -0.001  0.012
1/10/1986   0.003   0.003   0.010
1/11/1986   -0.003  0.003   -0.003
1/12/1986   0.003   -0.003  0.022
1/01/1987   0.001   0.013   -0.004
1/02/1987   0.004   -0.004  0.011
1/03/1987   0.004   0.008   0.005
1/04/1987   0.000   0.002   -0.002
1/05/1987   0.001   0.002   0.006
1/06/1987   0.004   0.010   0.00

I read in the data with:

require(RODBC)
conexcel <- odbcConnectExcel(xls.file="C:/data/example.xls")
s1 <- 'SOx'
dd <- sqlFetch(conexcel, s1)
odbcClose(conexcel)

This reads in the entire second column as NA. I think this is due to the fact it's guessed to be logical, and therefore the subsequent numbers are assessed as invalid and hence NA.

> str(dd)
'data.frame':   29 obs. of  4 variables:
 $ period: POSIXct, format: "1985-02-01" "1985-03-01" ...
 $ n     : num  0.00833 -0.00338 0.00157 0.00562 0.00117 ...
 $ n#ft  : logi  NA NA NA NA NA NA ...
 $ n#pt  : num  0.02515 -0.02394 0.0154 0.01224 0.00301 ...

I am trying to find a way to prevent this coercion to logical, which I think is causing the subsequent error.

I found this Q+A by searching SO, however I am at work and have no hope of being permitted to edit the registry to change the default for DWORD, as suggested (I understand that the value set here determines how many NAs are required before Microsoft guesses the data type and bombs my read).

Right now, I'm thinking that the best solution is to invert the data in Excel, and read it into R up-side-down.

I love a good hack but surely there's a better solution?

Soluzione

This is not a bug, but a feature of ODBC (note the lack of R) as documented here

http://support.microsoft.com/kb/257819/en-us

(long page, check for "mixed data type").

Since reading Excel files with ODBC is rather limited, I prefer one of the alternatives mentioned by Gabor, with preference for XLConnnect.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow