Question

I need to join two tables where the common column-id that I want to use has a different name in each table. The two tables have a "false" common column name that does not work when dplyr takes the default and joins on columns "id".

Here's some of the code involved in this problem

library(dplyr)
library(RMySQL)

SDB <- src_mysql(host = "localhost", user = "foo", dbname = "bar", password = getPassword())
# Then reference a tbl within that src
administrators <- tbl(SDB, "administrators")
members <- tbl(SDB, "members")

Here are 3 attempts -- that all fail -- to pass along the information that the common column on the members side is "id" and on the adminisrators side it's "idmember":

sqlq  <- semi_join(members,administrators, by=c("id","idmember"))
sqlq  <- inner_join(members,administrators, by= "id.x = idmember.y")
sqlq  <- semi_join(members,administrators, by.x = id, by.y = idmember)

Here's an example of the kinds of error messages I'm getting:

Error in mysqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: Unknown column '_LEFT.idmember' in 'where clause')

The examples I see out there pertain to data tables and data frames on the R side. My question is about how dplyr sends "by" statements to a SQL engine.

Was it helpful?

Solution

In the next version of dplyr, you'll be able to do:

inner_join(members, administrators, by = c("id" = "idmember"))

OTHER TIPS

Looks like this is an unresolved issue: https://github.com/hadley/dplyr/issues/177

However you can use merge:

❥ admin <- as.tbl(data.frame(id = c("1","2","3"),false = c(TRUE,FALSE,FALSE)))
❥ members <- as.tbl(data.frame(idmember = c("1","2","4"),false = c(TRUE,TRUE,FALSE)))
❥ merge(admin,members, by.x = "id", by.y = "idmember")
  id false.x false.y
1  1    TRUE    TRUE
2  2   FALSE    TRUE

If you need to do left or outer joins, you can always use the ALL.x, or ALL arguments to merge. A thought though... You've got a sql db, why not use it?

❥ con2 <- dbConnect(MySQL(), host = "localhost", user = "foo", dbname = "bar", password = getPassword())    
❥ dbGetQuery(con, "select * from admin join members on id = idmember")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top