Question

I'm working in an R script that uses a long SQL string, and I would like to keep the query relatively free of other markup so as to allow copying and pasting between editors and applications. I'd also like the ability to split the query across lines for better readability.

In the RODBC documentation, the paste function is used to build the query out of separate chunks, but I'd prefer something less kludgy and with fewer quotes and commas. Thanks for your help.

Was it helpful?

Solution

you can override the %+% operator to have better string concatination syntax:

'%+%' <- function(x,y) paste(x,y,sep="")

y<-"y1"
x<-"somethingorother"
query<-
'SELECT DISTINCT x AS ' %+% x %+%',\n'    %+%
'                y AS ' %+% y %+% '\n'    %+%
' FROM tbl
 WHERE id=%s
 AND num=%d'

cat(query,"\n")

yields:

> cat(query,"\n")
SELECT DISTINCT x AS somethingorother,
                y AS y1
 FROM tbl
 WHERE id=%s
 AND num=%d 

OTHER TIPS

If you're an old C programmer from way back, as I am, you might enjoy just using sprintf().

Borrowing Ian's example:

y<-"y1"
x<-"somethingorother"
query <- sprintf(
'SELECT DISTINCT x AS %s,
                 y AS %s,
 FROM tbl
 WHERE id=%%s
 AND num=%%d', x, y)

yields:

> cat(query,"\n")
SELECT DISTINCT x AS somethingorother,
                 y AS y1,
 FROM tbl
 WHERE id=%s
 AND num=%d 

A graceful way of "including" a long SQL query is to keep it in a separate .sql file. Preferably somewhere it can be syntax highlighted, a text file in RStudio will do the job. You can then in your main R script read the file into a string and populate it with variables using one of the many "named" sprintf-type solutions, such as infuser.

.sql

select *
from mytable
where id = {{a}} 
and somevar = {{b}}

.R

library(readr)
library(infuser)

query <- read_file("query.sql") %>%
         infuse(a = 1, b = 2) 

I'd recommend just using a plain string, and not embedding variable values into it. Use placeholders instead.

sql <- "SELECT foo FROM bar
    WHERE col1 = ?
    AND col2 = ?
    ORDER BY yomama"

I'm not sure if the double-quote is the best way to embed multi-line strings in R code (is there something like here-docs?), but it does work, unlike in Java.

Is there some reason you don't want to send "\n" or "\t" to your database? They should be fine in the SQL.

I've ended up simply hitting the sql string with sql <- gsub("\n","",sql) and sql <- gsub("\t","",sql) before running it. The string itself can be as long as it needs to be, but stays free of any concatenation markup.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top