Question

I have a parent table with entries for documents and I have a history table which logs an audit entry every time a user accesses one of the documents.

I'm writing a search query to return a list of documents (filtered by various criteria) with the latest user id to access each document returned in the result set.

Thus for


    DOCUMENTS
    ID | NAME
    1  | Document 1
    2  | Document 2
    3  | Document 3
    4  | Document 4
    5  | Document 5

    HISTORY
    DOC_ID | USER_ID | TIMESTAMP
    1      | 12345   | TODAY
    1      | 11111   | IN THE PAST
    1      | 11111   | IN THE PAST
    1      | 12345   | IN THE PAST
    2      | 11111   | TODAY
    2      | 12345   | IN THE PAST
    3      | 12345   | IN THE PAST

I'd be looking to get a return from my search like


    ID | NAME       | LAST_USER_ID
    1  | Document 1 | 12345
    2  | Document 2 | 11111
    3  | Document 3 | 12345
    4  | Document 4 | 
    5  | Document 5 | 

Can I easily do this with one SQL query and a join between the two tables?

Was it helpful?

Solution

Revising what Andy White produced, and replacing square brackets (MS SQL Server notation) with DB2 (and ISO standard SQL) "delimited identifiers":

SELECT d.id, d.name, h.last_user_id
    FROM Documents d LEFT JOIN
         (SELECT r.doc_id AS id, user_id AS last_user_id
              FROM History r JOIN
                   (SELECT doc_id, MAX("timestamp") AS "timestamp"
                        FROM History
                        GROUP BY doc_id
                   ) AS l
                   ON  r."timestamp" = l."timestamp"
                   AND r.doc_id      = l.doc_id
         ) AS h
         ON d.id = h.id

I'm not absolutely sure whether "timestamp" or "TIMESTAMP" is correct - probably the latter.

The advantage of this is that it replaces the inner correlated sub-query in Andy's version with a simpler non-correlated sub-query, which has the potential to be (radically?) more efficient.

OTHER TIPS

I couldn't get the "HAVING MAX(TIMESTAMP)" to run in SQL Server - I guess having requires a boolean expression like "having max(TIMESTAMP) > 2009-03-05" or something, which doesn't apply in this case. (I might be doing something wrong...)

Here is something that seems to work - note the join has 2 conditions (not sure if this is good or not):

select
    d.ID,
    d.NAME,
    h."USER_ID" as "LAST_USER_ID"
from Documents d
left join History h
    on d.ID = h.DOC_ID
    and h."TIMESTAMP" =
    (
        select max("TIMESTAMP")
        from "HISTORY"
        where "DOC_ID" = d.ID
    )

This doesn't use a join, but for some queries like this I like to inline the select for the field. If you want to catch the situation when no user has accessed you can wrap it with an NVL().

select a.ID, a.NAME,
(select x.user_id
 from HISTORY x
 where x.doc_id = a.id
   and x.timestamp = (select max(x1.timestamp)
                      from HISTORY x1
                      where x1.doc_id = x.doc_id)) as LAST_USER_ID
from DOCUMENTS a
where <your criteria here>

I think it should be something like this:

SELECT ID, Name,  b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a LEFT JOIN
    ( SELECT DOC_ID, USER_ID 
          FROM HISTORY
              GROUP BY DOC_ID, USER_ID
              HAVING MAX( TIMESTAMP )) as b
    ON a.ID = b.DOC_ID

this might work also:

SELECT ID, Name,  b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a 
  LEFT JOIN HISTORY b ON a.ID = b.DOC_ID
GROUP BY DOC_ID, USER_ID
HAVING MAX( TIMESTAMP )
Select ID, Name, User_ID
From Documents Left Outer Join
History a on ID = DOC_ID
Where ( TimeStamp = ( Select Max(TimeStamp)
                      From History b
                      Where a.DOC_ID = b.DOC_ID ) OR
        TimeStamp Is NULL )  /* this accomodates the Left */
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top