Question

I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:

15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'  
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'  
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'  
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'  

I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.

Was it helpful?

Solution

I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...

One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.

The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.

Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.

You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.

OTHER TIPS

This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).

SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;

That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.

A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:

SELECT * 
FROM Table
JOIN (
   SELECT UserId, Code, MAX(Date) as LastDate
   FROM Table
   GROUP BY UserId, Code
) as Latest ON
   Table.UserId = Latest.UserId
   AND Table.Code = Latest.Code
   AND Table.Date = Latest.Date
WHERE
   UserId = @userId
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top