Question

I have two arrays built while parsing a text file. The first contains the column names, the second contains the values from the current row. I need to iterate over both lists at once to build a map. Right now I have the following:

var currentValues = currentRow.Split(separatorChar);
var valueEnumerator = currentValues.GetEnumerator();

foreach (String column in columnList)
{
    valueEnumerator.MoveNext();
    valueMap.Add(column, (String)valueEnumerator.Current);
}

This works just fine, but it doesn't quite satisfy my sense of elegance, and it gets really hairy if the number of arrays is larger than two (as I have to do occasionally). Does anyone have another, terser idiom?

Was it helpful?

Solution

if there are the same number of column names as there are elements in each row, could you not use a for loop?

var currentValues = currentRow.Split(separatorChar);

for(var i=0;i<columnList.Length;i++){
   // use i to index both (or all) arrays and build your map
}

OTHER TIPS

You've got a non-obvious pseudo-bug in your initial code - IEnumerator<T> extends IDisposable so you should dispose it. This can be very important with iterator blocks! Not a problem for arrays, but would be with other IEnumerable<T> implementations.

I'd do it like this:

public static IEnumerable<TResult> PairUp<TFirst,TSecond,TResult>
    (this IEnumerable<TFirst> source, IEnumerable<TSecond> secondSequence,
     Func<TFirst,TSecond,TResult> projection)
{
    using (IEnumerator<TSecond> secondIter = secondSequence.GetEnumerator())
    {
        foreach (TFirst first in source)
        {
            if (!secondIter.MoveNext())
            {
                throw new ArgumentException
                    ("First sequence longer than second");
            }
            yield return projection(first, secondIter.Current);
        }
        if (secondIter.MoveNext())
        {
            throw new ArgumentException
                ("Second sequence longer than first");
        }
    }        
}

Then you can reuse this whenever you have the need:

foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar),
             (column, value) => new { column, value })
{
    // Do something
}

Alternatively you could create a generic Pair type, and get rid of the projection parameter in the PairUp method.

EDIT:

With the Pair type, the calling code would look like this:

foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar))
{
    // column = pair.First, value = pair.Second
}

That looks about as simple as you can get. Yes, you need to put the utility method somewhere, as reusable code. Hardly a problem in my view. Now for multiple arrays...

If the arrays are of different types, we have a problem. You can't express an arbitrary number of type parameters in a generic method/type declaration - you could write versions of PairUp for as many type parameters as you wanted, just like there are Action and Func delegates for up to 4 delegate parameters - but you can't make it arbitrary.

If the values will all be of the same type, however - and if you're happy to stick to arrays - it's easy. (Non-arrays is okay too, but you can't do the length checking ahead of time.) You could do this:

public static IEnumerable<T[]> Zip<T>(params T[][] sources)
{
    // (Insert error checking code here for null or empty sources parameter)

    int length = sources[0].Length;
    if (!sources.All(array => array.Length == length))
    {
        throw new ArgumentException("Arrays must all be of the same length");
    }

    for (int i=0; i < length; i++)
    {
        // Could do this bit with LINQ if you wanted
        T[] result = new T[sources.Length];
        for (int j=0; j < result.Length; j++)
        {
             result[j] = sources[j][i];
        }
        yield return result;
    }
}

Then the calling code would be:

foreach (var array in Zip(columns, row, whatevers))
{
    // column = array[0]
    // value = array[1]
    // whatever = array[2]
}

This involves a certain amount of copying, of course - you're creating an array each time. You could change that by introducing another type like this:

public struct Snapshot<T>
{
    readonly T[][] sources;
    readonly int index;

    public Snapshot(T[][] sources, int index)
    {
        this.sources = sources;
        this.index = index;
    }

    public T this[int element]
    {
        return sources[element][index];
    }
}

This would probably be regarded as overkill by most though ;)

I could keep coming up with all kinds of ideas, to be honest... but the basics are:

  • With a little bit of reusable work, you can make the calling code nicer
  • For arbitrary combinations of types you'll have to do each number of parameters (2, 3, 4...) separately due to the way generics works
  • If you're happy to use the same type for each part, you can do better

In a functional language you would usually find a "zip" function which will hopefully be part of a C#4.0 . Bart de Smet provides a funny implementation of zip based on existing LINQ functions:

public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
  this IEnumerable<TFirst> first, 
  IEnumerable<TSecond> second, 
  Func<TFirst, TSecond, TResult> func)
{
  return first.Select((x, i) => new { X = x, I = i })
    .Join(second.Select((x, i) => new { X = x, I = i }), 
    o => o.I, 
    i => i.I, 
    (o, i) => func(o.X, i.X));
}

Then you can do:

  int[] s1 = new [] { 1, 2, 3 };
  int[] s2 = new[] { 4, 5, 6 };
  var result = s1.Zip(s2, (i1, i2) => new {Value1 = i1, Value2 = i2});

If you're really using arrays, the best way is probably just to use the conventional for loop with indices. Not as nice, granted, but as far as I know .NET doesn't offer a better way of doing this.

You could also encapsulate your code into a method called zip – this is a common higher-order list function. However, C# lacking a suitable Tuple type, this is quite crufty. You'd end up returning an IEnumerable<KeyValuePair<T1, T2>> which isn't very nice.

By the way, are you really using IEnumerable instead of IEnumerable<T> or why do you cast the Current value?

Use IEnumerator for both would be nice

var currentValues = currentRow.Split(separatorChar);
using (IEnumerator<string> valueEnum = currentValues.GetEnumerator(), columnEnum = columnList.GetEnumerator()) {
    while (valueEnum.MoveNext() && columnEnum.MoveNext())
        valueMap.Add(columnEnum.Current, valueEnum.Current);
}

Or create an extension methods

public static IEnumerable<TResult> Zip<T1, T2, TResult>(this IEnumerable<T1> source, IEnumerable<T2> other, Func<T1, T2, TResult> selector) {
    using (IEnumerator<T1> sourceEnum = source.GetEnumerator()) {
        using (IEnumerator<T2> otherEnum = other.GetEnumerator()) {
            while (sourceEnum.MoveNext() && columnEnum.MoveNext())
                yield return selector(sourceEnum.Current, otherEnum.Current);
        }
    }
}

Usage

var currentValues = currentRow.Split(separatorChar);
foreach (var valueColumnPair in currentValues.Zip(columnList, (a, b) => new { Value = a, Column = b }) {
    valueMap.Add(valueColumnPair.Column, valueColumnPair.Value);
}

Instead of creating two seperate arrays you could make a two-dimensional array, or a dictionary (which would be better). But really, if it works I wouldn't try to change it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top