Question

I've been trying for a long time to find a "clean" pattern to handle a .SelectMany with anonymous types when you don't always want to return a result. My most common use case looks like this:

  1. We have a list of customers that I want to do reporting on.
  2. Each customer's data resides in a separate database, so I do a parallel .SelectMany
  3. In each lambda expression, I gather results for the customer toward the final report.
  4. If a particular customer should be skipped, I need to return a empty list.
  5. I whip these up often for quick reporting, so I'd prefer an anonymous type.

For example, the logic may looks something like this:

//c is a customer
var context = GetContextForCustomer(c);
// look up some data, myData using the context connection
if (someCondition)
  return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
  return null;

This could be implemented as a foreach statement:

var results = new List<WhatType?>();
foreach (var c in customers) {
  var context = GetContextForCustomer(c);
  if (someCondition)
    results.AddRange(myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 }));
}

Or it could be implemented with a .SelectMany that is pre-filtered with a .Where:

customers
  .Where(c => someCondition)
  .AsParallel()
  .SelectMany(c => {
     var context = GetContextForCustomer(c);
     return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
  })
  .ToList();

There are problems with both of these approaches. The foreach solution requires initializing a List to store the results, and you have to define the type. The .SelectMany with .Where is often impractical because the logic for someCondition is fairly complex and depends on some data lookups. So my ideal solution would look something like this:

customers
  .AsParallel()
  .SelectMany(c => {
     var context = GetContextForCustomer(c);
     if (someCondition)
       return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
     else
       continue?   return null?   return empty list?
  })
  .ToList();

What do I put in the else line to skip a return value? None of the solutions I can come up with work or are ideal:

  1. continue doesn't compile because it's not an active foreach loop
  2. return null causes an NRE
  3. return empty list requires me to initialize a list of anonymous type again.

Is there a way to accomplish the above that is clean, simple, and neat, and satisfies all my (picky) requirements?

Was it helpful?

Solution

You could return an empty Enumerable<dynamic>. Here's an example (though without your customers and someCondition, because I don't know what they are, but of the same general form of your example):

new int[] { 1, 2, 3, 4 }
    .AsParallel()
    .SelectMany(i => {
        if (i % 2 == 0)
            return Enumerable.Repeat(new { i, squared = i * i }, i);
        else
            return Enumerable.Empty<dynamic>();
        })
    .ToList();

So, with your objects and someCondition, it would look like

customers
    .AsParallel()
    .SelectMany(c => {
        var context = GetContextForCustomer(c);
        if (someCondition)
            return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
        else
            return Enumerable.Empty<dynamic>();
       })
    .ToList();

OTHER TIPS

Without knowing what someCondition and myData look like...

Why don't you just Select and Where the contexts as well:

customers
.Select(c => GetContextForCustomer(c))
.Where(ctx => someCondition)
.SelectMany(ctx => 
    myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });

EDIT: I just realized you need to carry both the customer and context further, so you can do this:

customers
.Select(c => new { Customer = c, Context = GetContextForCustomer(c) })
.Where(x => someCondition(x.Context))
.SelectMany(x => 
    myData.Select(d => new { CustomerID = x.Customer, X1 = d.x1, X2 = d.x2 });

You can try following:

customers
  .AsParallel()
  .SelectMany(c => {
     var context = GetContextForCustomer(c);
     if (someCondition)
       return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
     else
       return Enumerable.Empty<int>().Select(x => new { CustomerID = 0, X1 = "defValue", X2 = "defValue" });
  })
  .ToList();

All anonymous types with the same set of properties (the same names and types) are combined into one one anonymous class by compiler. That's why both your Select and the one on Enumerable.Empty will return the same T.

You can create your own variarion of SelectMany LINQ method which supports nulls:

public static class EnumerableExtensions
{
    public static IEnumerable<TResult> NullableSelectMany<TSource, TResult> (
        this IEnumerable<TSource> source,
        Func<TSource, IEnumerable<TResult>> selector)
    {
        if (source == null) 
            throw new ArgumentNullException("source");
        if (selector == null) 
            throw new ArgumentNullException("selector");
        foreach (TSource item in source) {
            IEnumerable<TResult> results = selector(item);
            if (results != null) {
                foreach (TResult result in results)
                    yield return result;
            }
        }
    }
}

Now you can return null in the selector lambda.

The accepted answer returns dynamic. The cleanest would be to move the filtering logic into a Where which makes the whole thing look better in linq context. Since you specifically rule that out in the question and I'm not a fan of delegates written over multiple lines in a linq call I will try this, but one can argue its more hacky.

var results = new 
{ 
    customerID = default(int), //notice the casing of property names
    x1 = default(U), //whatever types they are
    x2 = default(V) 
}.GetEmptyListOfThisType();

foreach (var customerID in customers) {
  var context = GetContextForCustomer(customerID);
  if (someCondition)
    results.AddRange(myData.Select(x => new { customerID, x.x1, x.x2 }));
}

public static List<T> GetEmptyListOfThisType<T>(this T item)
{
    return new List<T>();
}

Notice the appropriate use of property names which is in accordance with other variable names, hence you dont have to write the property names a second time in the Select call.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top