سؤال

Not sure if this belongs on SO or one of their other sites, let me know if I should move it.

Here's what I'm working with:
Its a very old eCommerce system, which has a table of products. This table of products gets updated by two files every night through Windows Task Scheduler. One file has product information (name, description, color, manufacturer, etc), the other file has information such as the pricing, available stock, warehouse location. The process of parsing these files and updating the products in the database takes multiple hours. I don't have control over these two files.

I need a way to determine if both files updated the product in the database. I can only display products to users that have information from both files. If the product was only updated by one, its considered incomplete and cannot be displayed.

As of right now, I'm thinking to add a trigger to add/update to count+1 and save to a new column each time a record/product is modified by either of the two tasks. Then make another task to check if that column is equal to 2 and do whats required to display or hide. But I hope there might be some other ideas out there?

هل كانت مفيدة؟

المحلول

In order to not have to worry about a complete regression test of the legacy process, I would opt for using a trigger to log updates on the Product table to a second table.

The trigger could determine the process being run by checking the columns updated. Then I would create a simple query to return all products that had been updated by both processes.

/* Test tables */
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Product')
  DROP TABLE Product

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'ProductChangeLog')
  DROP TABLE ProductChangeLog
GO

CREATE TABLE Product (
  ProductID INT NOT NULL PRIMARY KEY, 
  ProductName VARCHAR(50) NOT NULL,
  ProductDescription VARCHAR(100) NOT NULL,
  ProductPrice MONEY NOT NULL,
  ProductAvailQty INT NOT NULL
) 

CREATE TABLE ProductChangeLog (
  ChangeLogID INT NOT NULL IDENTITY(1,1), 
  ProductID INT NOT NULL,
  ChangeTime DATETIME NOT NULL,
  ChangeType VARCHAR(50) NOT NULL,
) 
GO 

CREATE TRIGGER tr_Product ON Product 
FOR UPDATE
AS

  INSERT ProductChangeLog (ProductID, ChangeTime, ChangeType) 
  SELECT 
    ProductID,
    GETDATE(),
    -- Log the process which was run depending on columns modified
    CASE WHEN UPDATE(ProductName) OR UPDATE(ProductDescription)
         THEN 'Process1'
         WHEN UPDATE(ProductPrice) OR UPDATE(ProductAvailQty)
         THEN 'Process2'
         ELSE 'Neither'
    END 
  FROM inserted
GO 

INSERT Product 
SELECT 1, 'Blue V-Neck', 'Sweater', 25.00, 100 UNION ALL 
SELECT 2, 'Green V-Neck', 'Sweater', 30.00, 200 UNION ALL
SELECT 3, 'Black crew', 'T-Shirt', 10.00, 45

/* Change two products in first process */
UPDATE Product 
SET ProductDescription = 'Men''s Sweater' 
WHERE ProductDescription = 'Sweater'

/* Change three products in second process */
UPDATE Product
SET ProductPrice = ProductPrice + 10

/* See all changes */
SELECT * 
FROM ProductChangeLog 

DECLARE @today DATETIME
SET @today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, GETDATE())) 

/* See all products changed today by both processes */
SELECT * 
FROM Product p 
WHERE EXISTS (
  SELECT * 
  FROM ProductChangeLog 
  WHERE ChangeType = 'Process1'
    AND ChangeTime > @today 
    AND ProductID = p.ProductID 
) 
AND EXISTS ( 
  SELECT * 
  FROM ProductChangeLog 
  WHERE ChangeType = 'Process2'
    AND ChangeTime > @today 
    AND ProductID = p.ProductID 
) 

*Tested in SQL Server 2005

نصائح أخرى

Use two columns with timestamps of the last update from each of the files. Only show records with both timestamps within the last X (24?) hours.

How about this, have another column of type bit (boolean). Let's call it "IsValid". For a clean record (one that you know was updated by both) the value would true. Anytime an update happens you flip that bit (apply a NOT operator), which you can easily do through a trigger. So when there are two updates (one from each file), the net value becomes true again. The part of your app that reads that record would look at that "IsValid" flag to determine if that record is viable or not.

Note This only works if each file updates the record zero or one time only in each update. Since if file 1 updates the same record twice and file 2 does nothing, the flag would erroneously report the record as valid.

suppose you have a table that has a primary key called Id_Record.

if all you want to do is display (or not) the LATEST updates to this table: create a separate table with Id_Record, Dt_Timestamp_File1, Dt_Timestamp_File2 and trigger a timestamp for each file update. then define the acceptable difference between the 2 timestamps (or whatever other rule you want).

then you can retrieve your records by:

SELECT Id_Record, Ds_Record_Name, Nr_Record_Price, Nr_Record_In_Stock FROM Stock WHERE Id_Record IN (SELECT Id_Record FROM Stock_Update WHERE Dt_Timestamp_File1 > Dt_Timestamp_File2 - 3600000 AND Dt_Timestamp_File1 < Dt_Timestamp_File2 + 3600000)

supposing you are using UnixTime (always easier to compare time in unixtime [milliseconds from 1/1/70]) and your max time difference is one hour.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top