Question

Let's say we have four entities in data model: Categories, Books, Authors and BookPages. Also assume Categories-Books, Books-Authors and Books-BookPages relationships are one-to-many.

If a category entity instance is retrieved from database - including "Books", "Books.BookPages" and "Books.Authors" - this will become a serious performance issue. Moreover, not including them will result in "Object reference is not set to an instance of an object" exception.

What is the best practice for using multiple Include method calls?

  • Write a single method GetCategoryById and include all items inside (performance issue)
  • Write a single method GetCategoryById and send a list of relationships to include (maybe, but still seems not elegant enough)
  • Write methods like GetCategoryByIdWithBooks, GetCategoryByIdWithBooksAndBooksPages and GetCategoryByIdWithBooksAndAuthors (not practical)

EDIT: By second option I meant something like this:

public static Category GetCategoryById(ModelEntities db, int categoryId, params string[] includeFields)
{
    var categories = db.Categories;

    foreach (string includeField in includeFields)
    {
        categories = categories.Include(includeField);
    }

    return categories.SingleOrDefault(i => i.CategoryId == categoryId);
}

When calling we need a code like this:

Category theCategory1 = CategoryHelper.GetCategoryById(db, 5, "Books");
Category theCategory2 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Pages");
Category theCategory3 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Authors");
Category theCategory4 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Pages", "Books.Authors");

Are there any distinct disadvantages of this approach?

Was it helpful?

Solution

Write a single method GetCategoryById and send a list of relationships to include (maybe, but still seems not elegant enough)

Write methods like GetCategoryByIdWithBooks, GetCategoryByIdWithBooksAndBooksPages and GetCategoryByIdWithBooksAndAuthors (not practical)

A combination of these two is currently my approach. I know what properties I want to include for each context, so I rather hand-code them (as you said yourself, lazy-loading isn't always an option, and if it is, you'll repeat the same repetitive Include()-like syntax when mapping from data models to DTO's).

This separation causes you to think harder about what datasets you want to expose, given data-access-code like this is usually hidden beneath a service.

By utilizing a base class containing a virtual method you can override to run the required Include()s:

using System.Data.Entity;

public class DataAccessBase<T>
{
    // For example redirect this to a DbContext.Set<T>().
    public IQueryable<T> DataSet { get; private set; }

    public IQueryable<T> Include(Func<IQueryable<T>, IQueryable<T>> include = null)
    {
        if (include == null)
        {
            // If omitted, apply the default Include() method 
            // (will call overridden Include() when it exists) 
            include = Include;
        }
        
        return include(DataSet);
    }
    
    public virtual IQueryable<T> Include(IQueryable<T> entities)
    {
        // provide optional entities.Include(f => f.Foo) that must be included for all entities
        return entities;
    }
}

You can then instantiate and use this class as-is, or extend it:

using System.Data.Entity;

public class BookAccess : DataAccessBase<Book>
{
    // Overridden to specify Include()s to be run for each book
    public override IQueryable<Book> Include(IQueryable<Book> entities)
    {
        return base.Include(entities)
                   .Include(e => e.Author);
    }
    
    // A separate Include()-method
    private IQueryable<Book> IncludePages(IQueryable<Book> entities)
    {
        return entities.Include(e => e.Pages);
    }
    
    // Access this method from the outside to retrieve all pages from each book
    public IEnumerable<Book> GetBooksWithPages()
    {
        var books = Include(IncludePages);
    }
}

Now you can instantiate a BookAccess and call methods on it:

var bookAccess = new BookAccess();

var allBooksWithoutNavigationProperties = bookAccess.DataSet;
var allBooksWithAuthors = bookAccess.Include();
var allBooksWithAuthorsAndPages = bookAccess.GetBooksWithPages();

In your case, you might want to create separate IncludePages and GetBooksWithPages-alike method pairs for each view of your collection. Or just write it as one method, the IncludePages method exists for reusability.

You can chain these methods all the way you like, since each of them (as well as Entity Framework's Include() extension method) returns yet another IQueryable<T>.

OTHER TIPS

As @Colin mentioned in the comments, you need to use the virtual keyword when defining navigation properties in order for them to work with lazy loading. Assuming you're using Code-First, your Book class should look something like this:

public class Book
{
  public int BookID { get; set; }
  //Whatever other information about the Book...
  public virtual Category Category { get; set; }
  public virtual List<Author> Authors { get; set; }
  public virtual List<BookPage> BookPages { get; set; }
}

If the virtual keyword is not used, then the proxy class created by EF will not be able to lazy load the related entity/entities.

Of course, if you are creating a new Book, it isn't going to be able to do lazy loading and will just throw the NullReferenceException if you try to iterate over the BookPages. That's why you should do one of two things:

  1. define a Book() constructor that includes BookPages = new List<BookPage>(); (same for Authors) or
  2. make sure that the ONLY time you ever have "new Book()" in your code is when you are creating a new entry that you are immediately saving to the database and then discarding without trying to get anything from it.

I personally prefer the 2nd option, but I know that many others prefer the 1st.

<EDIT> I found a third option, which is to use the Create method of the DbSet<> class. This means you would call myContext.Books.Create() instead of new Book(). See this Q+A for more info: Ramifications of DbSet.Create versus new Entity() </EDIT>

Now, the other way that lazy loading can break is when it is turned off. (I'm assuming that ModelEntities is the name of your DbContext class.) To turn it off, you would set ModelEntities.Configuration.LazyLoadingEnabled = false; Pretty self explanatory, no?

Bottom line is that you shouldn't need to use Include() everywhere. It's really meant to be more of a means of optimization rather than a requirement for your code to function. Using Include() excessively results in very poor performance because you end up getting far more than you really need from the database, because Include() will always bring in all related records. Let's say that you are loading a Category and there are 1000 Books belonging to that Category. You can't filter it to only include fetch the Books written by John Smith when using the Include() function. You can however (with lazy loading enabled) just do the following:

Category cat = ModelEntities.Categorys.Find(1);
var books = cat.Books.Where(b => b.Authors.Any(a => a.Name == "John Smith"));

This will actually result in fewer records being returned from the database and is much simpler to understand.

Hope that helps! ;)

Some of the performance considerations are ADO.Net connector specific. I would keep in mind a database view or stored procedure as a backup if you're not getting the performance needed.

First, note that DbContext (and ObjectContext) objects are not thread-safe.

If you are concerned about clairity over performance, then the first option is the simplest.

On the other hand, if you're worried about performance--and are willing to dispose of the context object after getting the data--then you can query the data with multiple simultaneous tasks (threads) each using their own context object.

If you need a context to track changes to the data, you have the straight forward way of a single query to add all the items to the context, or you can use the Attach method to 'rebuild' the original state, and then change and save.

The latter goes something like:

using(var dbContext = new DbContext())
{
    var categoryToChange = new Categories()
    {
        // set properties to original data
    };
    dbContext.Categories.Attach(categoryToChange);
    // set changed properties
    dbContext.SaveChanges();
}

Unfortunately there's no one best practice to meet all situations.

In db first approach, say you create BookStore.edmx and add the Category and Book entity and it generate context like public partial class BookStoreContext : DbContext then it is a simple good practice if you can add partial class like this:

public partial class BookStoreContext
{
    public IQueryable<Category> GetCategoriesWithBooks()
    {
        return Categories.Include(c => c.Books);
    }

    public IQueryable<Category> GetCategoriesWith(params string[] includeFields)
    {
        var categories = Categories.AsQueryable();
        foreach (string includeField in includeFields)
        {
            categories = categories.Include(includeField);
        }
        return categories;
    }

    // Just another example
    public IQueryable<Category> GetBooksWithAllDetails()
    {
        return Books
            .Include(c => c.Books.Authors)
            .Include(c => c.Books.Pages);
    }

    // yet another complex example
    public IQueryable<Category> GetNewBooks(/*...*/)
    {
        // probably you can pass sort by, tags filter etc in the parameter.
    }
}

Then you can use it like this:

var category1 = db.CategoriesWithBooks()
                      .Where(c => c.Id = 5).SingleOrDefault();
var category2 = db.CategoriesWith("Books.Pages", "Books.Authors")
                      .Where(c => c.Id = 5).SingleOrDefault(); // custom include

Note:

  • You can read some of simple (so many complicated one out there) Repository pattern just to extend IDbSet<Category> Categories to group common Include and Where instead of using static CategoryHelper. So you can have IQueryable<Category> db.Categories.WithBooks()
  • You should not include all child entities in GetCategoryById because it does not self explain in the method name and that will cause performance issue if user of this method is not brother about Books entites.
  • Even though you not include all, if you use lazy loading, you still can have potential N+1 performance issue
  • If you have 1000 of Books better you page your load something like this db.Books.Where(b => b.CategoryId = categoryId).Skip(skip).Take(take).ToList() or even better you add the method above to be like this db.GetBooksByCategoryId(categoryId, skip, take)

I myself prefer explicitly loading entities since I will 'aware' what is currently loaded but lazy loading is only useful if you have conditional loading children entities and should be used within a small scope of db context otherwise I can't control the db hit and how big the result.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top