Question

I have found out that there is a difference error (Remaining Qty <> (Purchase - Sales)) in one of the items in my database. There are more than 5000 items and more than 100,000 transactions has happened up to now. Same procedure runs every time for this calculation.

But only a single item has an error and it has happened only one time. It does not seem to be happening from the application side since it has run for more than 100,000 times. The database server is not physically protected and does not connect to the Internet. Only I can log in to the server. No exceptions has happened in the procedure since I log the errors. I do not understand how the above error happened.

I have seen that it's possible to open unencrypted SQL Server backup files using Notepad and read its data. Is the following scenario possible or not?

  1. Boot the server using a live CD and copy the database .mdf file
  2. Edit and change the data of it (Example: some numeric values)
  3. Replace the original .mdf file on the server with a hacked database

If the above scenario is not possible, is there a chance this has happened due to an error from the SQL Server side or through communication error (not from the application code side).

Was it helpful?

Solution

Is it possible to edit the data file while the SQL Server instance is stopped: definitely, the page layout and formats are well enough documented. Sounds unlikely though, if someone has access to do this deliberately they likely have access to much easier ways to affect the data.

Is it possible to affect a backup this way: yes. A plain backup is essentially a copy of the used pages in the database files. But similarly unlikely.

Far more likely, is that there has been a bit-flip on disc or in memory of during transfer between our over the network. These are one in a billion occurrences (caveat: figure picked from thin air though I'm sure you'll find research on such reliability of you want a real number) but given how many times we read and write bits in our machines we are all going to be subject to them occasionally. That is why things like ECC RAM exist.

Assuming your database isn't ancient enough (created in an earlier version of SQL Server than 2005) to not have page checksums turned on by default and you haven't explicitly turned the feature off, it keeps a checksum value on each page that can be used to detect many forms of corruption caused this way. Have you run a full DBCC CHECKDB (see https://docs.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-checkdb-transact-sql) to verify you have no corruption that can be detected that way?

As Dan discusses in his answer, a race condition could be the cause of the problem if transactions are not properly used (not used at all, or used with insufficient isolation level settings).

OTHER TIPS

In my experience, the root cause of these symptoms is most commonly a race condition in the application and/or T-SQL code rather than data corruption due to hardware, network, or malicious activity. Race conditions are insidious because the undesired outcome occurs by happenstance, difficult to reproduce, and may rarely occur.

One must use transactions, along with the appropriate isolation level (or locking hints), to ensure aggregated values like Remaining Qty in the items table reflects the related Purchase and Sales transactions.

EDIT

Testing concurrency problems is a challenge because multiple sessions are required and often many iterations too. Below is an example technique using multiple SSMS query windows I use to reproduce production issues as well as proactively identify suspect code before release. Be aware that race conditions are a matter of timing so even due diligence with testing may not catch rare occurrences.

--example setup script with race condition vulnerability
USE YourDatabase;
DROP TABLE dbo.Transactions;
DROP TABLE dbo.Item;
CREATE TABLE dbo.Item(
      ItemNumber int NOT NULL
          CONSTRAINT PK_Item PRIMARY KEY 
    , RemainingQTY int NOT NULL
    );
CREATE TABLE dbo.Transactions(
      TransactionID int NOT NULL IDENTITY
        CONSTRAINT PK_Traansactions PRIMARY KEY 
    , ItemNumber int NOT NULL
          CONSTRAINT FK_Traansactions_Item FOREIGN KEY REFERENCES dbo.Item(ItemNumber)
          INDEX idx_Transactions_ItemNumber
    , TransactionType varchar(10) NOT NULL 
        CONSTRAINT CK_Transactions_TransactionType CHECK (TransactionType IN('Purchase', 'Sale'))
    , QTY int
);
--load 5000 items
WITH 
     t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
    ,t10k AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num  FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c CROSS JOIN t10 AS d)
INSERT INTO dbo.Item(ItemNumber, RemainingQTY)
SELECT num, 1000
FROM t10k
WHERE num <= 5000;
--load 5000 purchase transactions
WITH 
     t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
    ,t10k AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num  FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c CROSS JOIN t10 AS d)
INSERT INTO dbo.Transactions(ItemNumber, TransactionType, QTY)
SELECT ItemNumber, 'Purchase', 1000
FROM dbo.Item;
GO

CREATE OR ALTER PROC dbo.usp_InsertTransaction
      @ItemNumber int
    , @TransactionType varchar(10)
    , @QTY int
AS
SET NOCOUNT ON;
SET XACT_ABORT ON;
DECLARE @RemainingQTY int;
BEGIN TRY

    BEGIN TRAN;

    SELECT @RemainingQTY = RemainingQTY
    FROM dbo.Item
    WHERE ItemNumber = @ItemNumber;

    IF @TransactionType = 'Purchase' 
        SET @RemainingQTY = @RemainingQTY + @QTY
    ELSE
        SET @RemainingQTY = @RemainingQTY - @QTY;

    UPDATE dbo.Item
    SET RemainingQTY = @RemainingQTY
    WHERE ItemNumber = @ItemNumber;

    INSERT INTO dbo.Transactions VALUES
        (@ItemNumber, @TransactionType, @QTY);

    COMMIT;
END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0 ROLLBACK;
    THROW;
END CATCH
GO

Open 3 SSMS query windows and run these scripts as instructed:

--Step 1: Run script in SSMS query window 1 to acquire exclusive lock to sync start of test executions
DECLARE @return_code int;
EXEC @return_code = sp_getapplock
     @Resource = 'concurrency_test'
    ,@LockMode = 'exclusive'
    ,@LockOwner = 'session';
RAISERROR('sp_getapplock return code is %d', 0, 0, @return_code) WITH NOWAIT;
GO

--Step 2: Run this script in SSMS query windows 2 and 3 to acquire a shared lock on same resource as session 1.
--These will block until session 1 lock is released to start test.
DECLARE @return_code int;
EXEC @return_code = sp_getapplock
     @Resource = 'concurrency_test'
    ,@LockMode = 'shared'
    ,@LockOwner = 'session';
RAISERROR('sp_getapplock return code is %d', 0, 0, @return_code) WITH NOWAIT;
GO
--execute queries to insert transactions and update RemainingQTY 100 times
DECLARE @IterationCount int = 0;
WHILE @IterationCount < 100
BEGIN
    EXEC dbo.usp_InsertTransaction
          @ItemNumber = 1000
        , @TransactionType = 'Sale'
        , @QTY = 1;
    SET @IterationCount +=  1;
END;
GO
--release lock after test completes
DECLARE @return_code int;
EXEC @return_code = sp_releaseapplock
     @Resource = 'concurrency_test'
    ,@LockOwner = 'session';
RAISERROR('sp_releaseapplock return code is %d', 0, 0, @return_code) WITH NOWAIT;
GO

--Step 3: Run script in SSMS query window 1 to release lock to start test
DECLARE @return_code int;
EXEC @return_code = sp_releaseapplock
     @Resource = 'concurrency_test'
    ,@LockOwner = 'session';
RAISERROR('sp_releaseapplock return code is %d', 0, 0, @return_code) WITH NOWAIT;
GO

--Step 4: Run this script to validate RemainingQTY after test completes
WITH transaction_summary AS (
    SELECT
          t.ItemNumber
        , SUM(CASE WHEN t.TransactionType = 'Purchase' THEN t.QTY END) AS Purchase
        , SUM(CASE WHEN t.TransactionType = 'Sale' THEN t.QTY END) AS Sale
    FROM dbo.Transactions AS t
    WHERE t.ItemNumber = 1000
    GROUP BY t.ItemNumber
)
SELECT 
      i.ItemNumber
    , i.RemainingQTY AS ItemRemainingQTY
    , t.Purchase
    , t.Sale
    , t.Purchase - t.Sale AS ActualTramsactionsRemainingQTY
FROM dbo.Item AS i
JOIN transaction_summary AS t
    ON t.ItemNumber = i.ItemNumber
WHERE i.RemainingQTY <> t.Purchase - t.Sale;
GO

Below is the output of the step 4 validation query after a test on my box that shows item RemainingQTY is invalid. This happened 31 times of 200 total proc calls on my test machine. The cause is different sessions both executed the SELECT query at about the same time and read the same RemainingQTY value. One session reduced the RemainingQTY as desired but the value was subsequently overwritten by the other session based on stale data. This test ran in a tight loop so it's more likely to occur than in a common prod workload.

ItemNumber ItemRemainingQTY Purchase Sale ActualTramsactionsRemainingQTY
1000 831 1000 200 800

One way to fix this code bug is to refactor the proc to avoid the local variable. This will serialize updates to the item row and perform slightly better too by eliminating the SELECT query:

CREATE OR ALTER PROC dbo.usp_InsertTransaction
      @ItemNumber int
    , @TransactionType varchar(10)
    , @QTY int
AS
SET NOCOUNT ON;
SET XACT_ABORT ON;
DECLARE @RemainingQTY int;
BEGIN TRY

    BEGIN TRAN;

    UPDATE dbo.Item
    SET RemainingQTY = RemainingQTY +
        CASE @TransactionType WHEN 'Purchase' THEN @RemainingQTY ELSE @RemainingQTY * -1 END
    WHERE ItemNumber = @ItemNumber;

    INSERT INTO dbo.Transactions VALUES
        (@ItemNumber, @TransactionType, @QTY);

    COMMIT;

END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0 ROLLBACK;
    THROW;
END CATCH
GO

Is there a possibility som Mercury-level genius exists that could do that? Maybe. Is there a possibility i could win jackpot 10 ttimes in row? Yes, and its more likely then your scenario.

Could it be done by other software? Yes.

Is it possible there was a communication or server error? Unlikely.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top