Is there a specific way to handle timestamp columns in R when pulling data using RPostgreSQL?

StackOverflow https://stackoverflow.com/questions/17754635

  •  03-06-2022
  •  | 
  •  

Question

I'm trying to pull data from a PostgreSQL database and the results for a timestamp field are inconsistent. I'm not sure if I'm handling POSIXct results properly. Otherwise, I think I found a bug in the RPostgreSQL package. Here is the way to replicate the issue:

Suppose there is a table in a postgres database with one field (run this in PostgreSQL):

CREATE DATABASE mydb;
CREATE TABLE test_table
(   
  "DateTime" timestamp without time zone NOT NULL,
  CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime")
)
WITH (
  OIDS=FALSE
);
ALTER TABLE test_table
  OWNER TO postgres;

And let’s say there are a few hundred records. I will populate them in R. Here is the code:

library(RPostgreSQL)

# Let's feed the table with some sequence of date/time values
date_values <-  as.chron(seq(10000, 10500, 1/24))

format.chron <- function(z)  {
  sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00", 
            as.numeric(as.character(years(z))), 
            months(z), 
            as.numeric(as.character(days(z))), 
            as.numeric(as.character(hours(z))), 
            as.numeric(as.character(minutes(z))))
}

.generateInsertQuery <- function(date_values, field_name, table_name) {
  insert_val  <- paste(paste0("(", sQuote(format(date_values)), ")"), collapse=',')
  qry         <- paste("INSERT INTO", dQuote(table_name), paste0("(", dQuote(field_name), ")"), "VALUES", insert_val)
  qry
}

drv <- dbDriver('PostgreSQL')
con <- dbConnect(drv, user='postgres', dbname='mydb')
qry <- .generateInsertQuery(date_values, "DateTime", "test_table")
dbSendQuery(con, qry)

If I try to get the values, the time component gets stripped out of the resulting data

res <- dbGetQuery(con, "SELECT * FROM test_table")
res[1:20,1]

The class of the result, however, is POSIXct

class(res[,1])

If the result is fetched one record at a time, the values with hour:min equal to 00:00 loose the time component:

rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table")
res_list <- list()
for(i in 1:100) res_list[i]  <- fetch(rs,1)
res_list

As a workaround, I'm fetching the result 1 record at a time, fixing, and aggregating them into a data.frame. But this is very time-consuming, especially for large data sets. Any ideas of why this is happening and how to deal with this issue?

Thanks in advance!

Was it helpful?

Solution 2

First off, the RPostgreSQL project has a mailing list; I suggest you post there.

PostgreSQL has two datetime types: with and without timezone. As I recall, R only maps the latter. I did write some early regression tests for this (see the package source) but have not been that involved with the project of late. But I do recall that POSIXct maps back and forth to the PostgreSQL datetime type just fine.

OTHER TIPS

RPostgreSQL's dbWriteTable with any posixct field will create database field of type timestamp with timezone always with tz +00 no matter what posixct timezone it will be. I believe more precise would be to create timestamp without timezone instead.

Best solution for both dbReadTable and dbWriteTable is to use Sys.setenv(TZ = "UTC"). In my opinion it is too deep dependency because many other processes in R session may require proper timezone setting.

Much more specific and not to deep dependent is to define own dbReadTable and dbWriteTable which wraps the DBI version with appropriate preprocess/postprocess of posixct types. But still it is not an option if you are developing DBI-compliant code/package (not only postgres related).

It would be great to have RPostgreSQL migrated to github for easier contribution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top