Prevent the DbContext from repeatedly trying to save bad data

https://stackoverflow.com//questions/24001698

20-12-2019
|

Question

I have a process that is importing an Excel Spreadhseet, and parsing the data into my data objects. The source of this data is very questionable, as we're moving our customer from a spreadsheet-based data management into a managed database system with checks for valid data.

During my import process, I do some basic sanity checks of the data to accommodate just how bad the data could be that we're importing, but I have my overall validation being done in the DbContext.

Part of what I'm trying to do is that I want to provide the Row # in the spreadsheet that the data is bad so they can easily determine what they need to fix to get the file to import.

Once I have the data from the spreadsheet (model), and the Opportunity they're working with from the database (opp), here's the pseudocode of my process:

foreach (var model in Spreadsheet.Rows) { // Again, pseudocode
    if(opp != null && ValidateModel(model, opp, row)) {
        // Copy properties to the database object

        // This is in a Repository-layer method, not directly in my import process.
        // Just written here for clarity instead of several nested method calls.
        context.SaveChanges(); 
    }
}

I can provide more of the code here if needed, but the problem comes in my DbContext's ValidateEntity() method (override of DbContext).

Again, there's nothing wrong with the code that I've written, as far as I'm aware, but if an Opportunity that failed this level of validation, then it stays as part of the unsaved objects in the context, meaning that it repeatedly tries to get validated every time the ValidateEntity() is called. This leads to a repeat of the same Validation Error message for every row after the initial problem occurs.

Is there a way to [edit]get the Context to stop trying to validate the object after it fails validation once[edit]? I know I could wait until the end and call context.SaveChanges() once at the end to get around this, but I would like to be able to match this with what row it is in the Database.

For reference, I am using Entity Framework 6.1 with a Code First approach.

EDIT Attempting to clarify further for Marc L. (including an update to the code block above)

Right now, my process will iterate through as many rows as there are in the Spreadsheet. The reason why I'm calling my Repository layer with each object to save, instead of working with an approach that only calls context.SaveChanges() once is to allow myself the ability to determine which row is the one that is causing a validation error.

I'm glad that my DbContext's custom ValidateEntity() methods are catching the validation errors, but the problem resides in the fact that it is not throwing the DbEntityValidationException for the same entity multiple times.

I would like it so that if the object fails validation once, the context no longer tries to save the object, regardless of how many times context.SaveChanges() is called.

Solution

Your question is not a dupe (this is about saving, not loaded entities) but you could follow Jimmy's advice above. That is, once an entity is added to the context it is tracked in the "added" state and the only way to stop it from re-validating is by detaching it. It's an SO-internal link, but I'll reproduce the code snippet:

dbContext.Entry(entity).State = EntityState.Detached;

However, I don't think that's the way you want to go, because you're using exceptions to manage state unnecessarily (exceptions are notoriously expensive).

Working from the information given, I'd use a more set-based solution:

modify your model class so that it contains a RowID that records the original spreadsheet row (there's probably other good reasons to have this, too)
turn off entity-tracking for the context (turns of change detection allowing each Add() to be O(1))
add all the entities
call context.GetValidationErrors() and get all your errors at once, using the aforementioned RowID to identify the invalid rows.

You didn't indicate whether your process should save the good rows or reject the file as a whole, but this will accommodate either--that is, if you need to save the good rows, detach all the invalid rows using the code above and then SaveChanges().

Finally, if you do want to save the good rows and you're uncomfortable with the set-based method, it would be better to use a new DbContext for every single row, or at least create a new DbContext after each error. The ADO.NET team insists that context-creation is "relatively cheap" (sorry I don't have a cite or stats at hand for this) so this shouldn't damage your throughput too much. Even so, it will at least remain O(n). I wouldn't blame you, managing a large context can open you up to other issues as well.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow