Question

I am writing a fairly large service centered around Stanford's Folding@Home project. This portion of the project is a WCF service hosted inside of a Windows Service. With proper database indices and a dual core Core2Duo/7200rpm platter I am able to run approximately 1500 rows per second (SQL 2012 Datacenter instance). Each hour when I run this update, it takes a considerable amount of time to iterate through all 1.5 million users and add updates where necessary.

Looking at the performance profiler in SQL Server Management Studio 2012, I see that every user is being loaded via individual queries. Is there a way with EF to eagerly load a set of a given size of users, update them in memory, then save the updated users - using queries more elegant than single-select, single-update? I am currently using EF5, but if I need to move to 6 for improved performance, I will. The main source of delay on this process is waiting for database results.

Also, if there is anything I should change about the ForAll or pre-processing, feel free to mention it. The group pre-processing is very quick and dramatically increases the speed of the update by controlling each EF context's size - but if I can pre-process more and improve the overall time, I am more than willing to look into it!

private void DoUpdate(IEnumerable<Update> table)
{
    var t = table.ToList();
    var numberOfRowsInGroups = t.Count() / (Properties.Settings.Default.UpdatesPerContext); //Control each local context size.  120 works well on most systems I have.

    //Split work groups out of the table of updates.
    var groups = t.AsParallel()
                    .Select((update, index) => new {Value = update, Index = index})
                    .GroupBy(a => a.Index % numberOfRowsInGroups)
                    .ToList();

    groups.AsParallel().ForAll(group =>
    {
        var ents = new FoldingDataEntities();
        ents.Configuration.AutoDetectChangesEnabled = false;
        ents.Configuration.LazyLoadingEnabled = true;
        ents.Database.Connection.Open();

        var count = 0;
        foreach (var a in group)
        {
            var update = a.Value;
            var data = UserData.GetUserData(update.Name, update.Team, ents); //(Name,Team) is a superkey; passing ents allows external context control

            if (data.TotalPoints < update.NewCredit)
            {
                data.addUpdate(update.NewCredit, update.Sum); //basic arithmetic, very quick - may attach a row to the UserData.Updates collection. (does not SaveChanges here)
            }
        }

        ents.ChangeTracker.DetectChanges();
        ents.SaveChanges();
    });
}

//from the UserData class which wraps the EF code.
public static UserData GetUserData(string name, long team, FoldingDataEntities ents)
{
    return context.Users.Local.FirstOrDefault(u => (u.Team == team && u.Name == name))
        ?? context.Users.FirstOrDefault(u => (u.Team == team && u.Name == name))
        ?? context.Users.Add(new User { Name = name, Team = team, StartDate = DateTime.Now, LastUpdate = DateTime.Now });
}

internal struct Update
{
    public string Name;
    public long NewCredit;
    public long Sum;
    public long Team;
}
Was it helpful?

Solution

EF is not the solution for raw performance... It's the "easy way" to do a Data Access Layer, or DAL, but comes with a fair bit of overhead. I'd highly recommend using Dapper or raw ADO.NET to do a bulk update... Would be a lot faster.

http://www.ormbattle.net/

Now, to answer your question, to do a batch update in EF, you'll need to download some extensions and third party plugins that will enable such abilities. See: Batch update/delete EF5

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top