Domanda

This is related to a question I asked previously for which lag/lead was suggested. However the data I'm working with are more complex than I first thought so I need a more robust solution. This screen shot shows an issue I need to tackle:

enter image description here

Within a single serial number, a shipment event defines a new reference window. So records 2,3,4 relate to 1. Record 6 relates to 5 and so forth. I need to mark the records for which the BillToId doesn't match the parent shipment.

I'm trying to understand if I could even use the LAG function to compare records 2,3,4 back to 1 when the number of post-shipment events varies (duplicates are allowed). I was thinking I might be better off with another fact table that identifies the parent rowid along each record first?

So then my question becomes how do I efficiently identify which shipment each row belongs to? Am I forced to run a subquery for each record? I'm working right now with over 2 million total rows. I would later make this query part of the ETL process so it would be processing smaller chunks of data.

È stato utile?

Soluzione

Here is an approach that uses the cumulative sum functionality in SQL Server. The idea is to assign each "ship" activity a value of "1" and "0" for everything else. Then do a cumulative sum to identify each group that should have the same billtoid. After that, the ship information can be assigned to all records in the same group:

select rowid, dateid, billtoid, activitytypeid, serialnumber
from (select t.*,
             max(case when activitytypeid = 'Ship' then billtoid end) over
                  (partition by serialnumber, cumships) as ship_billtoid
      from (select t.*,
                   sum(case when activitytypeid = 'Ship' then 1 else 0 end) over
                       (partition by serialnumber order by rowid) as cumships
            from t
           ) t
     ) t
where billtoid <> ship_billtoid;
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top