How specify join variables with different names in different MySQL tables
Pergunta
I need to join two tables where the common column-id that I want to use has a different name in each table. The two tables have a "false" common column name that does not work when dplyr takes the default and joins on columns "id".
Here's some of the code involved in this problem
library(dplyr)
library(RMySQL)
SDB <- src_mysql(host = "localhost", user = "foo", dbname = "bar", password = getPassword())
# Then reference a tbl within that src
administrators <- tbl(SDB, "administrators")
members <- tbl(SDB, "members")
Here are 3 attempts -- that all fail -- to pass along the information that the common column on the members side is "id" and on the adminisrators side it's "idmember":
sqlq <- semi_join(members,administrators, by=c("id","idmember"))
sqlq <- inner_join(members,administrators, by= "id.x = idmember.y")
sqlq <- semi_join(members,administrators, by.x = id, by.y = idmember)
Here's an example of the kinds of error messages I'm getting:
Error in mysqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: Unknown column '_LEFT.idmember' in 'where clause')
The examples I see out there pertain to data tables and data frames on the R side. My question is about how dplyr sends "by" statements to a SQL engine.
Solução
In the next version of dplyr, you'll be able to do:
inner_join(members, administrators, by = c("id" = "idmember"))
Outras dicas
Looks like this is an unresolved issue: https://github.com/hadley/dplyr/issues/177
However you can use merge:
❥ admin <- as.tbl(data.frame(id = c("1","2","3"),false = c(TRUE,FALSE,FALSE)))
❥ members <- as.tbl(data.frame(idmember = c("1","2","4"),false = c(TRUE,TRUE,FALSE)))
❥ merge(admin,members, by.x = "id", by.y = "idmember")
id false.x false.y
1 1 TRUE TRUE
2 2 FALSE TRUE
If you need to do left or outer joins, you can always use the ALL.x, or ALL arguments to merge. A thought though... You've got a sql db, why not use it?
❥ con2 <- dbConnect(MySQL(), host = "localhost", user = "foo", dbname = "bar", password = getPassword())
❥ dbGetQuery(con, "select * from admin join members on id = idmember")