Question

I am new to R, working with R studio and am enjoying it. I am basically trying to segment an SQL statement to replace all the aliases with the full table names. I have already been helped massively on this forum with the code below

The below code, takes and SQL statement and splits it up into its basic components so SELECT FROM WHERE

I am using the SQL as an example but the statement could have any number of tables and so many more aliases. The aim is to write a loop which will substitute the aliases with the full table name irrespective of the SQL and number of aliases. Currently my gsub function will only substitute the second alias in my query. I was wondering if anyone can see the fault in my logic?

txt <- "SELECT AL1.attr1,AL2.attr2 FROM Table_1 as AL1, Table_2 as AL2 WHERE AL1.attr1 == 1"

###########################################################################################
# First Split the SQL statement into SELECT FROM and WHERE clause (1 Row FOr each)
# Take The From Clause and Split that on Period so AL1.Attrib1 = AL1  Attrib1
# Then split on 'as' so splitting the alias from the actual table name
###########################################################################################

Reference:

SQLSplit = sapply(strsplit(txt,split="WHERE|FROM|SELECT"),trim)
SQLSegmented = unlist(strsplit(SQLSplit, ".|,", fixed = TRUE))
SplitOnPeriod = sapply(strsplit(SQLSegmented[2],split=","),trim)
SplitOnComma = sapply(strsplit(SplitOnPeriod,split="as"),trim)

for (i in 1:ncol(SplitOnComma)) 
{
    cat(SplitOnComma[1,i])
    cat(SplitOnComma[2,i])
    test = gsub(SplitOnComma[2,i], SplitOnComma[1,i], SQLSegmented[1])
}
Was it helpful?

Solution

This might just be a problem with how you're [not] storing the results from each iteration through the for loop. What if you repeatedly update the same string, called test below, to end up with a single, fully-updated string?

test <- SQLSegmented[1]
for (i in 1:ncol(SplitOnComma)) 
{
  cat(SplitOnComma[1,i])
  cat(SplitOnComma[2,i])
  test = gsub(SplitOnComma[2,i], SplitOnComma[1,i], test)
}
test

OTHER TIPS

library(stringr)

txt <- "SELECT AL1.attr1,AL2.attr2 FROM Table_1 as AL1, Table_2 as AL2 WHERE AL1.attr1 == 1"

matches <- str_match_all(txt, "([A-Za-z0-9_]+)\ +as\ +([A-Za-z0-9_]+)")

for (i in 1:nrow(matches[[1]])) {

  txt <- gsub(sprintf("%s.", matches[[1]][i,3]), 
              sprintf("%s.", matches[[1]][i,2]),
              txt,
              fixed=TRUE)

}

txt <- gsub("\ +as\ +[A-Za-z_0-9]+", "", txt)
txt
## [1] "SELECT Table_1.attr1,Table_2.attr2 FROM Table_1, Table_2 WHERE Table_1.attr1 == 1"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top