Identifying which conditions in a "Where" clause were satisifed

https://stackoverflow.com/questions/23385265

12-07-2023
|

Question

Ready for a nice brain workout?

Background: I've written a generic "Query Builder" framework using C# Entity Framework 5, which allows a user to query a "root" table based on a compound query, using a control looking something like this:

enter image description here

In the image above, the root table would be "City", and the Sql query would automatically include all necessary related tables, compiling down to something like:

select ci.* from City ci
join Country cr on ci.CountryID = cr.ID
where cr.ContinentID = 2 -- 2=Europe
or (cr.Name like '%z%' 
    and ci.Population > 10000000)

The code architecture reflects the table structure I created to model this (I've obfuscated some fields that are not relevant to our discussion):

enter image description here

In EF, this is expressed in inheritance terms as follows:

Everything is descended from QueryClause, which is marked abstract in the model. There are two kinds of QueryClause:
1. SimpleQueryClause, which has a MetricType that you would set to the value of an enum appropriate to the implementation of your QueryBuilder (e.g. Continent, CountryName, CityPopulation in this case), Comparator (=, >=, contains etc.) and Value (a serialized value representing the value on the right hand side of the comparator). Don't worry about MetricExternalID for now; it's for a more complex type of clause.
2. CompoundQueryClause is just a parent QueryClause, and the AnyConditionSufficient field is just a bit indicating whether its direct children are aggregated with an "And" or an "Or". Note that the ParentQueryClauseID on QueryClause is a FK to CompoundQueryClause, because a SimpleQueryClause cannot be a parent.
QueryRoot is a special case of CompoundQueryClause, being, as its name implies, the root of a query.

So anyway, this structure works swimmingly well, I have code that translates all this into Expression<> trees that can output SQL for a set of predefined filters (each filter condition is represented by an Expression<Func<T, bool>> (T is City, in our example), and EF translates it all into SQL. Maybe not the most optimal SQL, I'll grant you, and I have run into limitations of chaining too many nested Expressions, but for our purposes it works like a dream.

Problem: The users enjoy this framework so much, and it works so nicely that they figure it must have been easy to develop, so they just want one little extra feature. They want to know, in the records returned from the query, which SimpleQueryClauses caused each row to be returned.

So, for example, let's say the query in our example returned (among others) the following records:

Geneva, Switzerland (population 200,000)
London, England (population 15,000,000)
Rio de Janeiro, Brazil (population 11,800,000)

We would want to show the user the following satisfied clauses:

Geneva: "Continent = Europe". We do not show "Country Name contains 'z'", because we're only interested in such cases if the population is greater than 10 million, which it is not.
London: "Continent = Europe". The converse of Geneva; the population is greater than 10,000,000, but we don't show it because the country name doesn't contain a 'z'
Brazil: "Country Name contains 'z'", "Population > 10,000,000". Since both those "and" conditions are satisfied, we return them both.
I can't think offhand of any European cities that satisfy all 3 conditions, but if there were, we would display them all. In other words, we don't short-cut "or" conditions, but we can short-cut "and" conditions.

So I put my mind to this and came up with a more or less functioning proof of concept, by iterating through the result set, traversing the query tree according to logical and/or rules, compiling down each simple query clause to see if that individual clause is true or false, and acting accordingly.

The trouble with this approach is that it is extremely slow, because it is riddled with "select n+1" problems: my result set starts off as an IQueryable<City>, but in order to find out details about the continent, I have to load the related Country object for each City. OK, maybe not a huge penalty to slap a .Include(ci=>ci.Country) onto the IQueryable, but what if one of my possible filter clauses does an aggregation on a one-to-many relationship, such as "Number of Customers"? It would be unthinkable to include city.Customers in my record set, but I need to be able to count them.

So, can you think of any smart way of optimizing this process, either by translating it all into SQL, or else doing it in code in a way that doesn't create a "select n+1" pattern? Or maybe there's a third way that's even smarter?

Solution

If you can afford stretching the number of columns in the result per query condition. You could try change the generated sql into something like:

case when Condition1 then 1 else 0 end as Condition

With the Example from OP:

select ci.*,
case when cr.ContinentID = 2 then 1 else 0 end as ContinentIsEurope,
case when cr.Name like '%z%' 
    and ci.Population > 10000000 then 1 else 0 end as InventEasilyIdentifiableCondionName from City ci
join Country cr on ci.CountryID = cr.ID
where cr.ContinentID = 2 -- 2=Europe
or (cr.Name like '%z%' 
    and ci.Population > 10000000)

Not sure if this will be performant though, and might generate really long sql queries

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow