How to get a 'clean' list of excel worksheet tab names in R with RODBC?
Question
I'm new to R and even newer to using it with Excel. I want to get a list of all the worksheet names (Notes,Weights,Lengths) in an .xls file. You can see what I'm trying below - the problem is that the output has a $ dollar sign at the end for some reason and is sometimes also surrounded with single quotes.
FileToImport <- "C:\\folder\\filetoimport.xls"
z <- odbcConnectExcel(FileToImport, readOnly = TRUE)
sqlTables(z)
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS
1 C:\\folder\\filetoimport.xls <NA> Notes$ SYSTEM TABLE <NA>
2 C:\\folder\\filetoimport.xls <NA> 'Weights$' TABLE <NA>
3 C:\\folder\\filetoimport.xls <NA> 'Lengths$' TABLE <NA>
sqlTables(z)[,"TABLE_NAME"]
[1] "Notes$" "'Weights$'" "'Lengths$'"
I could try to clean these characters up but I don't really know how to go about this since the quotes format is inconsistent - some of the workbooks are "SYSTEM TABLEs" and some are just "TABLEs". Could someone explain what the difference between these worksheets is and give me an idea of how to recreate just the 'clean' tabnames?
Solution
Thanks to the above nudge in the right direction, I managed to use regular expressions to get the worksheet names in the desired output (without any punctuation).
gsub("[[:punct:]]","",sqlTables(z)[,"TABLE_NAME"])
[1] "Sheet1" "Sheet2" "Sheet3"
OTHER TIPS
I have not much experience with RODBC
but do you mean the following output by clean?
data.frame(sqlTables(z))$TABLE_NAME
[1] "Sheet1$" "Sheet2$" "Sheet3$" "ZRDaten1"
if you save that in a vector say b
you can access them with z[i]
. If you only need a certain type what about:
na.omit(ifelse(data.frame(sqlTables(z))$TABLE_TYPE=='SYSTEM TABLE', data.frame(sqlTables(z))$TABLE_NAME, NA))
[1] "Sheet1$" "Sheet2$" "Sheet3$"
admittedly unelegant....