Question

I want to load a plain file into Greenplum database using external tables. Can I specify input format for timestamps/date/time fields? (If you know the answer for PostgreSQL, please reply as well)

For example, with Oracle I can use DATE_FORMAT DATE MASK 'YYYYMMDD' to tell how to parse the date. For Netezza I can specify DATESTYLE 'YMD'. For Greenplum I cannot find the answer. I can describe fields as char, and then parse them during the load, but this is an ugly workaround.

Here is my tentative code:

CREATE EXTERNAL TABLE MY_TBL (X date, Y time, Z timestamp ) 
LOCATION (
 'gpfdist://host:8001/file1.txt',
 'gpfdist://host:8002/file2.txt'
) FORMAT 'TEXT' (DELIMITER '|' NULL '')
Was it helpful?

Solution

It appears that you can:

SET DATESTYLE = 'YMD';

before SELECTing from the table. This will affect the interpretation of all dates, though, not just those from the file. If you consistently use unambiguous ISO dates elsewhere that will be fine, but it may be a problem if (for example) you need to also accept 'D/M/Y' date literals in the same query.

This is specific to GreenPlum's CREATE EXTERNAL TABLE and does not apply to SQL-standard SQL/MED foreign data wrappers, as shown below.


What surprises me is that PostgreSQL proper (which does not have this CREATE EXTERNAL TABLE feature) always accepts ISO-style YYYY-MM-DD and YYYYMMDD dates, irrespective of DATESTYLE. Observe:

regress=> SELECT '20121229'::date, '2012-12-29'::date, current_setting('DateStyle');
    date    |    date    | current_setting 
------------+------------+-----------------
 2012-12-29 | 2012-12-29 | ISO, MDY
(1 row)

regress=> SET DateStyle = 'DMY';
SET
regress=> SELECT '20121229'::date, '2012-12-29'::date, current_setting('DateStyle');
    date    |    date    | current_setting 
------------+------------+-----------------
 2012-12-29 | 2012-12-29 | ISO, DMY
(1 row)

... so if GreenPlum behaved the same way, you should not need to do anything to get these YYYYMMDD dates to be read correctly from the input file.

Here's how it works with a PostgreSQL file_fdw SQL/MED foreign data wrapper:

CREATE EXTENSION file_fdw;

COPY (SELECT '20121229', '2012-12-29') TO '/tmp/dates.csv' CSV;

SET DateStyle = 'DMY';

CREATE SERVER csvtest FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE csvtest (
    date1 date,
    date2 date
) SERVER csvtest OPTIONS ( filename '/tmp/dates.csv', format 'csv' );

SELECT * FROM csvtest ;
   date1    |   date2    
------------+------------
 2012-12-29 | 2012-12-29
(1 row)

The CSV file contents are:

20121229,2012-12-29

so you can see that Pg will always accept ISO dates for CSV, irrespective of datestyle.

If GreenPlum doesn't, please file a bug. The idea of DateStyle changing the way a foreign table is read after creation is crazy.

OTHER TIPS

Yes you can.

You do this by specifying the field in the external table to be of type text. Then, use a transformation in the insert statement. You can also use gpload and define the transformation. Both solutions are similar to the solution described above.

Here is a simple file with an integer and a date expressed as year month day, separated by a space:

date1.txt

1|2012 10 12
2|2012 11 13

Start gpfdist:

gpfdist -p 8010 -d ./ -l ./gpfdist.log &

Use psql to create the external table, the target table, and load the data:

psql test

test=# create external table ext.t2( i int, d text ) 
  location ('gpfdist://walstl-mbp.local:8010/date1.txt') 
  format 'TEXT' ( delimiter '|' )
;


test=# select * from ext.t2; i |     d      
---+------------
  1 | 2012 10 12
  2 | 2012 11 13
(2 rows)

Now, create the table that the data will be loaded into:

test=# create table test.t2 ( i int, d date ) 
;

And,load the table:

test=# insert into test.t2 select i, to_date(d,'YYYY MM DD') from ext.t2 ;

test=# select * from test.t2;
 i |     d      
---+------------
 1 | 2012-10-12
 2 | 2012-11-13
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top