Question

Relational division is one of Codd's primitive relational operators, colloquially known as the suppliers who supply all parts. There have been various translations into SQL e.g. Celko discusses several approaches using the example of the pilots who can fly all the planes in the hangar.

The one I prefer is "division with set operators" because it is "with remainder" (i.e. Wilson can also fly a F-17 Fighter but there isn't one in the hangar) and how it handles the case when the divisor is the empty set (i.e. when the hangar is empty then all pilots are returned):

WITH PilotSkills
     AS
     (
      SELECT * 
        FROM (
              VALUES ( 'Celko', 'Piper Cub' ), 
                     ( 'Higgins', 'B-52 Bomber' ), ( 'Higgins', 'F-14 Fighter' ),
                     ( 'Higgins', 'Piper Cub' ), 
                     ( 'Jones', 'B-52 Bomber' ), ( 'Jones', 'F-14 Fighter' ),
                     ( 'Smith', 'B-1 Bomber' ), ( 'Smith', 'B-52 Bomber' ),
                     ( 'Smith', 'F-14 Fighter' ),
                     ( 'Wilson', 'B-1 Bomber' ), ( 'Wilson', 'B-52 Bomber' ),
                     ( 'Wilson', 'F-14 Fighter' ), ( 'Wilson', 'F-17 Fighter' )
             ) AS T ( pilot_name, plane_name )
     ), 
     Hangar
     AS
     (
      SELECT * 
        FROM (
              VALUES ( 'B-1 Bomber' ), 
                     ( 'B-52 Bomber' ), 
                     ( 'F-14 Fighter' )
             ) AS T ( plane_name )
     )
SELECT DISTINCT pilot_name 
  FROM PilotSkills AS P1
 WHERE NOT EXISTS (
                   SELECT plane_name 
                     FROM Hangar
                   EXCEPT
                   SELECT plane_name
                     FROM PilotSkills AS P2
                    WHERE P1.pilot_name = P2.pilot_name
                  );

Now I need to do this in LINQ to Objects. Here's a suggested direct translation:

var hangar = new [] 
{ 
    new { PlaneName = "B-1 Bomber" },
    new { PlaneName = "F-14 Fighter" },
    new { PlaneName = "B-52 Bomber" }
}.AsEnumerable();

var pilotSkills = new [] 
{ 
    new { PilotName = "Celko", PlaneName = "Piper Cub" },
    new { PilotName = "Higgins", PlaneName = "B-52 Bomber" },
    new { PilotName = "Higgins", PlaneName = "F-14 Fighter" },
    new { PilotName = "Higgins", PlaneName = "Piper Cub" },
    new { PilotName = "Jones", PlaneName = "B-52 Bomber" },
    new { PilotName = "Jones", PlaneName = "F-14 Fighter" },
    new { PilotName = "Smith", PlaneName = "B-1 Bomber" },
    new { PilotName = "Smith", PlaneName = "B-52 Bomber" },
    new { PilotName = "Smith", PlaneName = "F-14 Fighter" },
    new { PilotName = "Wilson", PlaneName = "B-1 Bomber" },
    new { PilotName = "Wilson", PlaneName = "B-52 Bomber" },
    new { PilotName = "Wilson", PlaneName = "F-14 Fighter" },
    new { PilotName = "Wilson", PlaneName = "F-17 Fighter" }
}.AsEnumerable();

var actual = pilotSkills.Where
(
    p1 => hangar.Except
    ( 
        pilotSkills.Where( p2 => p2.PilotName == p1.PilotName )
                    .Select( p2 => new { p2.PlaneName } )
    ).Any() == false 
).Select( p1 => new { p1.PilotName } ).Distinct();

var expected = new [] 
{
    new { PilotName = "Smith" },
    new { PilotName = "Wilson" }
};

Assert.That( actual, Is.EquivalentTo( expected ) );

As LINQ is supposedly based on the relational algebra then a direct translation seems reasonable. But is there a better 'native' LINQ approach?


Reflecting on @Daniel Hilgarth's answer, in .NET Land the data is likely to be 'grouped' to begin with:

var pilotSkills = new [] 
{ 
    new { PilotName = "Celko", 
            Planes = new [] 
            { new { PlaneName = "Piper Cub" }, } },
    new { PilotName = "Higgins", 
            Planes = new [] 
            { new { PlaneName = "B-52 Bomber" }, 
              new { PlaneName = "F-14 Fighter" }, 
              new { PlaneName = "Piper Cub" }, } },
    new { PilotName = "Jones", 
            Planes = new [] 
            { new { PlaneName = "B-52 Bomber" }, 
              new { PlaneName = "F-14 Fighter" }, } },
    new { PilotName = "Smith", 
            Planes = new [] 
            { new { PlaneName = "B-1 Bomber" }, 
              new { PlaneName = "B-52 Bomber" }, 
              new { PlaneName = "F-14 Fighter" }, } },
    new { PilotName = "Wilson", 
            Planes = new [] 
            { new { PlaneName = "B-1 Bomber" }, 
              new { PlaneName = "B-52 Bomber" }, 
              new { PlaneName = "F-14 Fighter" }, 
              new { PlaneName = "F-17 Fighter" }, } },
};

...and projecting just the name is arbitrary, making the potential solutions much more straightforward:

 // Easy to understand at a glance:
var actual1 = pilotSkills.Where( x => hangar.All( y => x.Planes.Contains(y) ));

// Potentially more efficient:
var actual = pilotSkills.Where( x => !hangar.Except( x.Planes ).Any() );
Was it helpful?

Solution

The following should yield the same result:

pilotSkills.GroupBy(x => x.PilotName, x => x.PlaneName)
           .Where(g => hangar.All(y => g.Contains(y.PlaneName)))

This will return one group per pilot that can fly all planes in the hangar.
The key of the group is the pilot's name, the content of the group are all the planes the pilot can fly, even those that aren't in the hangar.

If you now just want the pilots, you can add a .Select(g => new { PilotName = g.Key }) to the end of the query.


Using the above approach with Except, making it closer to the OP's original:

pilotSkills.GroupBy(x => x.PilotName, x => new { x.PlaneName } )
           .Where(g => !hangar.Except(g).Any());

This second query is potentially better, because it iterates g only once; the first query with Contains iterates it N times with N being the number of planes in the hangar.

OTHER TIPS

Using the approach in @Daniel Hilgarth's answer but closer to my original:

var actual = pilotSkills
                 .GroupBy(x => x.PilotName, x => new { x.PlaneName })
                 .Where(g => !hangar.Except(g).Any())
                 .Select(x => new { PilotName = x.Key });
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top