Question

I am needing to eager load a hierarchy structure so that I can recursively iterate through it. The eager loading is necessary to prevent multiple db queries while traversing the tree. It seems the consensus is that you can't eager load infinite levels of the tree, so I did something like

                var item= db.ItemHierarchies
                    .Include("Children.Children.Children.Children.Children")
                    .Where(x => x.condition == condition)

to load 5 levels of children. This seems to get the job done. I'm wondering what the drawback is to doing this? If there is none then theoretically could I add 50 levels of includes here without slowing things down?

Was it helpful?

Solution 2

Well honestly.. It's an extremely bad practice.

Let's assume you had 50 objects in your root.. and 50 per level.
You may end up retrieving 312500000 "capsules" of information.

Now, one might ask: "So what is wrong with that?!", I mean if that is what is required than why not do that..

Rule #1: we develop software that should be used by human beings.
And the fact is that no human capable of taking a glimpse at 312500000 items of information at once and learn or conclude something beneficial out of it. (except.. that it does not help him or her to watch it)

Rule #2: UI should be based on what is needed and not what is possible.
And since we already established that showing 312500000 capsules of data is not needed there is no reason to bring all that at once.

And now you might come forward and say - But I don't care about the UI, really! All I need is to iterate in that data in order to process some information!

In that case you would probably want to save your results somewhere for future reference, but that means that its a batch job.. so why not apply batch job rules upon it.. like process it item by item which will also may give you the benefit of splitting it between even more machines if needed.

So you see.. no matter which path you choose there should be no reason to do it.
(= definition of what is a bad practice.)

Update:
After reading interesting concerns in the comments, I would like to update this answer with more analysis:

Deciding what is a bad practice must always be in reference to what is to be achieved or what is the role of each part in the system. In the current situation (after reading the comments) it has been brought or implied that the data storage is actually a persistent medium for objects opposed to a different concept where the data is the 'heart' of the application.

We can define two data types:

1) Data-Center which is being used in data-centric applications such as banks, CRM, ERP, websites or other service based solutions.

VS.

2) Data-Persistence medium which is being used as data to be saved for when the application is not active, in example: any simple app save file or any game save file and etc.

The main difference is that a data persistence medium is to be accessed only by a single instance of the app at a single point in time.. meaning the data is not designed to be shared by many instances. if the data is to be shared - we are dealing with a data-center application.

If your app just need a data-persistence medium - loading all the information cannot be considered as a bad practice - but you still need to make sure you are not exploding the memory. and in that frame of work, SQL Server might not be what you need or the best tool to use.

In the other case of Data-Centric application - my original answer remains as it will be a bad practice to bring all the information per instance of the application.

OTHER TIPS

I recommend taking a look at the SQL that is generated as you add eager loading to your query.

var item= db.ItemHierarchies
    .Include("Children")
    .Include("Children.Children")
    .Include("Children.Children.Children")
    .Include("Children.Children.Children.Children")
    .Include("Children.Children.Children.Children.Children")    

var sql = ((System.Data.Objects.ObjectQuery)  item).ToTraceString()

// http://visualstudiomagazine.com/blogs/tool-tracker/2011/11/seeing-the-sql.aspx

You'll see that the SQL quickly gets very big and complicated and can potentially have serious performance implications. You'd do well to limit your eager loading to data that you are certain you will need and to consider using explicit loading for some of the related entities - especially if you're working with connected entities in which case you can explicitly load collection properties when they're needed.

Also note that you may not need multiple separate Includes. For example, the following needs to be separate Includes because they're addressing separate properties (Widgets and Spanners) of the root.

var item= db.ItemHierarchies
    .Include("Widgets")
    .Include("Spanners.Flanges")

But the following isn't necessary:

var item= db.ItemHierarchies
    .Include("Widgets") //This isn't necessary.
    .Include("Widgets.Flanges") //This loads both Widges and Flanges.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top