Question

I have the situation:

Table1 has a list of companies.
Table2 has a list of addresses.
Table3 is a N relationship of Table1 and Table2, with fields 'begin' and 'end'.

Because companies may move over time, a LEFT JOIN among them results in multiple records for each company.

begin and end fields are never NULL. The solution to find the latest address is use a ORDER BY being DESC, and to remove older addresses is a LIMIT 1.

That works fine if the query can bring only 1 company. But I need a query that brings all Table1 records, joined with their current Table2 addresses. Therefore, the removal of outdated data must be done (AFAIK) in LEFT JOIN's ON clause.

Any idea how I can build the clause to not create duplicated Table1 companies and bring latest address?

Was it helpful?

Solution 2

I managed to solve it using Windows Function:

WITH ranked_relationship AS(
    SELECT
        *
        ,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
    FROM relationship
)

SELECT
    company.*
    address.*,
    dt_last_addr as dt_relationship
FROM
    company
    LEFT JOIN ranked_relationship as relationship
            ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
    LEFT JOIN address ON address.pk_address = relationship.fk_address

row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.

Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!

OTHER TIPS

Use a dependent subquery with max() function in a join condition.
Something like in this example:

SELECT *
FROM companies c
LEFT JOIN relationship r
ON c.company_id = r.company_id
   AND r."begin" = (
        SELECT max("begin")
        FROM relationship r1
        WHERE c.company_id = r1.company_id
     )
INNER JOIN addresses a
ON a.address_id = r.address_id 

demo: http://sqlfiddle.com/#!15/f80c6/2

Since PostgreSQL 9.3 there is JOIN LATERAL (https://www.postgresql.org/docs/9.4/queries-table-expressions.html) that allows to make a sub-query to join, so it solves your issue in an elegant way:

SELECT * FROM companies c
JOIN LATERAL (
    SELECT * FROM relationship r
    WHERE c.company_id = r.company_id
    ORDER BY r."begin" DESC LIMIT 1
) r ON TRUE
JOIN addresses a ON a.address_id = r.address_id

The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top