Question

(By object-relational mapping, I mean what is described here: Wikipedia: Object-relational mapping.)

Here is how I could imagine this work in R : a kind of "virtual data frame" is linked to a database, and returns the results of SQL queries when accessed. For instance, head(virtual_list) would actually return the results of (select * from mapped_table limit 5) on the mapped database.

I have found this post by John Myles White, but there seems to have been no progress in the last 3 years.

Is there a working package that implements this ?

If not,

  1. Would it be useful ?
  2. What would be the best way to implement it (S4 ?) ?
Was it helpful?

Solution

The very recent package dplyr is implementing this (amongst other amazing features).

Here are illustrations from the examples of function src_mysql():

# Connection basics ---------------------------------------------------------
# To connect to a database first create a src:
my_db <- src_mysql(host = "blah.com", user = "hadley",
  password = "pass")
# Then reference a tbl within that src
my_tbl <- tbl(my_db, "my_table")

# Methods -------------------------------------------------------------------
batting <- tbl(lahman_mysql(), "Batting")
dim(batting)
colnames(batting)
head(batting)

OTHER TIPS

There is an old unsupported package, SQLiteDF, that does that. Build it from source and ignore the numerous error messages.

> # from example(sqlite.data.frame)
>
> library(SQLiteDF)
> iris.sdf <- sqlite.data.frame(iris)
> iris.sdf$Petal.Length[1:10] # $ done via SQL
 [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5

Looks like John Myles White he's given up on it.

There is a bit of a workaround explained here.

I don't think it would be useful. R is not a real OOP language. The "central" data structure in R is the data frame. No need for Object-Relational Mapping here.What you want is a mapping between SQL tables and data frames and the RMySQL and RODBC provide just that :

dbGetQuery to return the results of a query in a data frame and dbWriteTable to insert data in a table or do a bulk update ( from a data frame).

As an experienced R user, I would not use this. First off, this 'virtual frame' would be slow to use, since you constantly need to synchronize between R memory and the database. It would also require locking the database table, since otherwise you have unpredictable results due to other edits happening at the same time.

Finally, I do not think R is suited for implementing a different evaluation of promise objects. Doing myFrame$foo[ myFrame$foo > 40 ] will still fetch the full foo column, since you cannot possible implement a full translation scheme from R to SQL.

Therefore, I prefer to load a dataframe() from a query, use it, and write it back to the database if required.

Next to the various driver packages for querying DBs (DBI, RODBC,RJDBC,RMySql,...) and dplyr, there's also sqldf https://cran.r-project.org/web/packages/sqldf/

This will automatically import dataframes into the db & let you query the data via sql. At the end the db is deleted.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top