Вопрос

I have a file called q_cleanup.sql that I am reading into R via readLines(). This file has lots of little queries we wrote to clean up some really ugly data. Once I read the into R and process the text, I run each query in the file.

All of the queries work when run directly through Oracle's SQL Developer and Tora. Some of the queries fail when run via RODBC.

For example. The file contains the following two queries (cut and pasted out of the file)

update T_HH_TMP
set program_type = 'not able to contact'
where
    program_type like '%n0t%'
    or program_type like '%not able to%'
;

update T_HH_TMP
set program_type = 'hh substance use'
where program_type like '%hh substance abuse%'
;

The first query runs. The second query errors. Below is the relevant section out of my cleanup.R file. The command odbcStart() is a function I built to simplify opening and closing rodbc connections. It is not the problem.

odbcStart()

qry <- readLines("sql/q_cleanup.sql")
qry <- paste(qry[-grep("--", qry)] , collapse=" ")
qry <- unlist(strsplit(qry, ";"))

for(i in seq_along(qry)) {
    print("------------------------------------------------------------")
    print(qry[i])
    print(sqlQuery(con, qry[i]))
}

odbcClose(com)

I am stripping off anything / everything that I can think of that might cause a problem and my string is wrapped in double quotes and my query contains ONLY single quotes. Yet, the output looks like this:

[1] "------------------------------------------------------------"
[1] "  update T_HH_TMP set program_type = 'not able to contact' where     program_type like '%n0t%'     or program_type like '%not able to%' "
character(0)
[1] "------------------------------------------------------------"
[1] "  update T_HH_TMP set program_type = 'hh substance use' where program_type like '\\%hh substance abuse\\%' "
[1] "[RODBC] ERROR: Could not SQLExecDirect '  update T_HH_TMP set program_type = 'hh substance use' where program_type like '\\%hh substance abuse\\%' '"

I do not feel that the % is the problem because the first query runs just fine. Any help? I really would prefer to script the running of all these queries in R.

Это было полезно?

Решение

I thought I would share what I know. I have a solution, even though I consider it sub-optimal because it complicates my workflow unnecessarily.

I do not know if the problem is caused by Oracle server, SQL Plus or if it has something to do with R / Emacs on Windows. I am not an Oracle expert and the office I work for is moving to Vertica by the end of the summer, so I am not going to invest much more effort in fixing this.

I am using sqlplus.exe to run SQL syntax that creates either a view or stored procedure and I am then running the view / SP via R. Thus, the command I have to pass to Oracle via R is SIMPLE and it can handle it.

To script sqlplus from R, I am using the following function that I will someday improve. It has no error handling and it basically assumes you are being nice, but it does work.

#' queryFile() runs a longish series of queries in a .sql file.
#' It is very important to understand that the path to sqlplus is hardcoded
#' because Windows has a shitty path system. It may not run on another system
#' without being edited.
#'
#' @param file - The relative path to the .sql file.
#' @return output - Vector containing the results from sqlplush
#'
queryFile <- function(file){
    cmd  <- "c:/Oracle/app/product/11.2.0/client_1/sqlplus.exe %user/%password@%db     @%file"
    cmd  <- gsub("%user", getOption("DataMart")$uid, cmd )
    cmd  <- gsub("%password", getOption("DataMart")$pwd, cmd )
    cmd  <- gsub("%db", getOption("DataMart")$db, cmd )
    cmd  <- gsub("%file", file, cmd )
    print(cmd)
    output <- system(cmd, intern=TRUE)
    return(output)
}

Apparently Markdown does not like my Roxygen style comments. Sorry.

The point of this function is that you pass it the file with the SQL syntax. It uses SQL Plus to run the syntax. To store / access user name, password, etc. I use a file called ~/passwords.R. It has a series of options() commands that look like this:

## Fake example.
options( DataMart = list(
              uid       = "user_name"
             ,pwd       = "user_password"
             ,db        = "TNS Database"
             ,con_type  = "ODBC"
             ,srvr_type = "Oracle"
                    )
        )

The last two (cont_type and srvr_type) are just things that I like to have documented. They are not really needed. I have ~ 10 of these in my file and I use this to remind me which db server I am writing against. I have to write against SQL Server, Vertica, MySQL and Oracle (different projects / employers) and this helps me.

The function I provided uses options() to access that necessary information and then runs SQLPlus.exe. I could have added SQLPlus to my Window's path, but I was trying to make this function semi-independent and it seems like our IT people are consistent about where SQL Plus lives (of course there are different versions running around, but at least I don't have to explain the idea of path to someone who is not really a programmer.)

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top