Question

I have the set of tuples

(id1, count1),(id2, count2) ... (idN, countN)

and I have the table in the PostgreSQL database with the columns

| tuple_id | project_id |

the tuples are produced by external application and passed as a stream of data.

What I need to do now - is to map id of tuple to project_id from the database, so it will produce

(projectid1, count1), (projectid2, count2) ... (projectidM, countM)

where M <= N - not every input tuple has an appropriate mapping in the table.

If I would need to get only ID mappings - then I would have done something like

SELECT project_id FROM tablename WHERE tuple_id IN ( ..... )

However I need to get both project_id and count values. Is there any other way of achieving that without creation of temporary table and then filling it with the data from the stream?

Sample input data (text file)

1,10
2,15
3,14

Data mappings (PostgreSQL table)

1, 37f6e23f-ef50-4c6f-a746-cb29ae3adf52
2, 8c73500f-2118-4bb7-b470-78ac1878896e
3, c28b19f2-9ec7-4278-ae02-1dbb39d6113d

Expected result:

37f6e23f-ef50-4c6f-a746-cb29ae3adf52, 10
8c73500f-2118-4bb7-b470-78ac1878896e, 15
c28b19f2-9ec7-4278-ae02-1dbb39d6113d, 14
Was it helpful?

Solution

You could use a foreign data wrapper (FDW) to read the file in as though it was a database table, then join it against your ID mappings table.

The file fdw looks like it would be suitable for this task.

This seemed to work:

CREATE TABLE mappings(id INT PRIMARY KEY, project_id UUID);
INSERT INTO mappings(id,project_id) VALUES 
    (1, '37f6e23f-ef50-4c6f-a746-cb29ae3adf52'),
    (2, '8c73500f-2118-4bb7-b470-78ac1878896e'),
    (3, 'c28b19f2-9ec7-4278-ae02-1dbb39d6113d');

CREATE EXTENSION file_fdw;
CREATE SERVER filedata FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE textfile (tupleid int, id_count int) 
    SERVER filedata OPTIONS ( filename '/tmp/test1.txt', format 'csv' );

SELECT project_id, id_count 
    FROM textfile 
    LEFT join mappings on textfile.tupleid=mappings.id;

              project_id              | id_count
--------------------------------------+----------
 37f6e23f-ef50-4c6f-a746-cb29ae3adf52 |       10
 8c73500f-2118-4bb7-b470-78ac1878896e |       15
 c28b19f2-9ec7-4278-ae02-1dbb39d6113d |       14
(3 rows)

The file-fdw seems a little picky on the file format. I found that a blank line at the end caused it to fail.

OTHER TIPS

Well, I know this is stupid, but it answers the question: all the required info is retrieved in single query, no temporary tables created:

select project_id, 
    (CASE tuple_id WHEN 1 THEN 10 WHEN 2 THEN 15 WHEN 3 THEN 14 END) as count
from tablename where tuple_id in (1,2,3)

All you need is to generate simple CASE statement.

I believe it maybe be slow when called with a lot of tuple_ids, but you didn't say anything about limitations/sizes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top