SQL standard select current records from an audit log question

https://stackoverflow.com/questions/1146004

16-09-2019
|

Question

My memory is failing me. I have a simple audit log table based on a trigger:

ID            int (identity, PK)
CustomerID    int
Name          varchar(255)
Address       varchar(255)
AuditDateTime datetime
AuditCode     char(1)

It has data like this:

ID CustomerID Name      Address             AuditDateTime          AuditCode 1  123        Bob       123 Internet Way    2009-07-17 13:18:06.353I          2  123        Bob       123 Internet Way    2009-07-17 13:19:02.117D          3  123        Jerry     123 Internet Way    2009-07-17 13:36:03.517I          4  123        Bob       123 My Edited Way   2009-07-17 13:36:08.050U          5  100        Arnold    100 SkyNet Way      2009-07-17 13:36:18.607I          6  100        Nicky     100 Star Way        2009-07-17 13:36:25.920U          7  110        Blondie   110 Another Way     2009-07-17 13:36:42.313I          8  113        Sally     113 Yet another Way 2009-07-17 13:36:57.627I

What would be the efficient select statement be to get all most current records between a start and end time? FYI: I for insert, D for delete, and U for update.

Am I missing anything in the audit table? My next step is to create an audit table that only records changes, yet you can extract the most recent records for the given time frame. For the life of me I cannot find it on any search engine easily. Links would work too. Thanks for the help.

Solution

Another (better?) method to keep audit history is to use a 'startDate' and 'endDate' column rather than an auditDateTime and AuditCode column. This is often the approach in tracking Type 2 changes (new versions of a row) in data warehouses.

This lets you more directly select the current rows (WHERE endDate is NULL), and you will not need to treat updates differently than inserts or deletes. You simply have three cases:

Insert: copy the full row along with a start date and NULL end date
Delete: set the End Date of the existing current row (endDate is NULL)
Update: do a Delete then Insert

Your select would simply be:

select * from AuditTable where endDate is NULL

Anyway, here's my query for your existing schema:

declare @from datetime
declare @to datetime

select b.* from (
  select
    customerId
    max(auditdatetime) 'auditDateTime'
  from
    AuditTable
  where
    auditcode in ('I', 'U')
    and auditdatetime between @from and @to
  group by customerId
  having 
    /* rely on "current" being defined as INSERTS > DELETES */
    sum(case when auditcode = 'I' then 1 else 0 end) > 
    sum(case when auditcode = 'D' then 1 else 0 end)
) a
cross apply(
  select top 1 customerId, name, address, auditdateTime
  from AuditTable
  where auditdatetime = a.auditdatetime and customerId = a.customerId
) b

References

A cribsheet for data warehouses, but has a good section on type 2 changes (what you want to track)

MSDN page on data warehousing

OTHER TIPS

Ok, a couple of things for audit log tables.

For most applications, we want audit tables to be extremely quick on insertion.

If the audit log is truly for diagnostic or for very irregular audit reasons, then the quickest insertion criteria is to make the table physically ordered upon insertion time.

And this means to put the audit time as the first column of the clustered index, e.g.

create unique clustered index idx_mytable on mytable(AuditDateTime, ID)

This will allow for extremely efficient select queries upon AuditDateTime O(log n), and O(1) insertions.

If you wish to look up your audit table on a per CustomerID basis, then you will need to compromise.

You may add a nonclustered index upon (CustomerID, AuditDateTime), which will allow for O(log n) lookup of per-customer audit history, however the cost will be the maintenance of that nonclustered index upon insertion - that maintenance will be O(log n) conversely.

However that insertion time penalty may be preferable to the table scan (that is, O(n) time complexity cost) that you will need to pay if you don't have an index on CustomerID and this is a regular query that is performed. An O(n) lookup which locks the table for the writing process for an irregular query may block up writers, so it is sometimes in writers' interests to be slightly slower if it guarantees that readers aren't going to be blocking their commits, because readers need to table scan because of a lack of a good index to support them....

Addition: if you are looking to restrict to a given timeframe, the most important thing first of all is the index upon AuditDateTime. And make it clustered as you are inserting in AuditDateTime order. This is the biggest thing you can do to make your query efficient from the start.

Next, if you are looking for the most recent update for all CustomerID's within a given timespan, well thereafter a full scan of the data, restricted by insertion date, is required.

You will need to do a subquery upon your audit table, between the range,

select CustomerID, max(AuditDateTime) MaxAuditDateTime 
from AuditTrail 
where AuditDateTime >= @begin and Audit DateTime <= @end

and then incorporate that into your select query proper, eg.

select AuditTrail.* from AuditTrail
inner join 
    (select CustomerID, max(AuditDateTime) MaxAuditDateTime 
     from AuditTrail 
     where AuditDateTime >= @begin and Audit DateTime <= @end
    ) filtration
    on filtration.CustomerID = AuditTrail.CustomerID and 
       filtration.AuditDateTime = AuditTrail.AuditDateTime

Another approach is using a sub select

select a.ID
       , a.CustomerID 
       , a.Name
       , a.Address
       , a.AuditDateTime
       , a.AuditCode
from   myauditlogtable a,
       (select s.id as maxid,max(s.AuditDateTime) 
                 from myauditlogtable as s 
                 group by maxid) 
        as subq
where subq.maxid=a.id;

start and end time? e.g as in between 1am to 3am
or start and end date time? e.g as in 2009-07-17 13:36 to 2009-07-18 13:36

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow