Question

For starting with: my title sucks, so help me figure out a new one?

I can't post all the SQL here (over 30k characters for the lot of it), so I stuck it on pastebin.com

The problem:

I get an XML file that I scrape some records from, and I need to extract some data from the records and build another table off it. The records are for an event going off and coming on, and I've included sample data in the pastebin for recreating. Without seeing the data it's kind of hard to explain. I've given all the data I have from my sample import which should be sufficient to build the app off of, but I'm not getting any more information than what is shown in the data.

I'll give you a moment to glance at the data so this makes sense.

So what I need to do is this: For each "off" event, I need to match it to the next "on" event, and I need to have two tables at the end, one table for "historical events" and one table for "current events". However, if I can just get "historical events" built correctly I can figure out how to get "current events" from that.

Business rules:

If two or more "off" events are collected before an "on" event, keep the oldest "off" event. If two or more "on" events are collected before an "off" event, keep the newest "on" event. If there is a complete pair, put them in the historical table. If there is an "off" event and not an "on" event, put it in the current table (so if I wanted to keep inserting/deleting from this table, that's fine too). If there is an "off" event already in the current table I can take it and move it to the historical table for an "on" event being read in (this will need to be implemented later, but if I can get the pairings matched initially I'll be able to go forward for now.

I think that's pretty much it for the logic. My thoughts are to either figure out how to do this in SQL or push it out to an app written in C# and do it with some temporary lists in C# and build what I need using for...next logic. This may be infinitely easier in C#, but I have a feeling SQL can do this job just as easily as C# can, so I needed some help from the dba gurus.

The queries I have already don't work, but that's where I got to with my starting before going home on Friday, and I've been mulling it since then, and building a sample problem that I can post online (and that whole life thing too ya know). The data is live data and exact, except the ID's being anonymized and the text fields changed to something simple to work with.

Here's a spreadsheet showing roughly how I want the data to look at the end and the way it looks now. There is the current data (with a spacer row between each ID for clarity), the data that would be in the historical table (aligned with the ID of the original data for understanding) and the current table (again aligned). I hope this can help clarify the business rules. https://spreadsheets.google.com/ccc?key=0AuvCdeHuVU5ddHRCNkpuWHBUREpRajlmLU5VX2xsWnc&hl=en&authkey=COq7y50H

So the complete SQL including tabledefs and current (very not-right) queries is on pastebin http://pastebin.com/k2f2CLnQ

Was it helpful?

Solution 2

So for continued commentary, and as what will likely be the answer:

I've just gone ahead and exported it to C# and processing it there. It'll be easier to do it procedurally than by sets, and I still need to figure out which comes first, off or on (when they're concurrent). Working with their PM to know, but I have a feeling even they don't know which happens when.

Anyways, so that all the discussion is kept in one place, see this transcript too: (if you're really interested) http://chat.stackexchange.com/rooms/179/conversation/date-alignment-and-pair-matching-extraction-best-done-with-tsql-or-c so there's that.

OTHER TIPS

Here's something I was fiddling with based on some work I had laying around. It doesn't handle the events clustered at a single time well. It could theoretically be helpful anyhow...:)

;WITH ordered_rows AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY Identifier ORDER BY EventTime) AS Row,
        Identifier, Type, EventTime, DiscoveredDate, FileId FROM #EventDataTemp
)
,filtered_rows AS
(
    SELECT Row, Identifier, Type, EventTime, DiscoveredDate, FileId, 
        CAST(CASE Type WHEN 'Went Off' THEN 1 ELSE NULL END AS INT) 
            AS OffEventRow
    FROM ordered_rows
    WHERE Row = 1
    UNION ALL
    SELECT o.Row, o.Identifier, o.Type, o.EventTime, o.DiscoveredDate, o.FileId,
        CAST(CASE WHEN o.Type = 'Went Off' AND f.Type = 'Went Off' 
            OR o.Type = 'Came On' THEN f.OffEventRow ELSE o.Row END AS INT)
    FROM ordered_rows o INNER JOIN filtered_rows f 
        ON o.Row = f.Row + 1 AND o.Identifier = f.Identifier
)
,on_events AS
(
   SELECT Identifier, OffEventRow, MAX(Row) AS OnRow
   FROM filtered_rows
   WHERE Type = 'Came On' AND OffEventRow IS NOT NULL
   GROUP BY OffEventRow, Identifier
)
SELECT f.Identifier, f.Type, f.EventTime, f.DiscoveredDate, f.FileId 
FROM filtered_rows f LEFT JOIN on_events o 
    ON f.Identifier = o.Identifier 
    AND f.Row = o.OnRow
WHERE f.Type = 'Went Off' AND f.Row = f.OffEventRow
OR f.Type = 'Came On' AND o.OnRow IS NOT NULL
ORDER BY f.Identifier, f.EventTime
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top