Pergunta

This method gets:

IEnumerable<object[]> - in which every array is in fixed size (it represent relational data structure).

DataEnumerable.Column[] - some metadata columns,mostly they will have the same value for all rows.

Expected outcome:

each "row" should get value for each of these columns (so the data structure remains relational).

    private IEnumerable<object[]> BindExtraColumns(IEnumerable<object[]> baseData, int dataSize, DataEnumerable.Column[] columnsToAdd)
    {
        int extraColumnsLength = columnsToAdd.Length;
        object[] row = new object[dataSize + extraColumnsLength];

        string columnName;
        int rowNumberColumnIndex = -1;

        for (int i = 0; i < extraColumnsLength; i++)
        {
            //Assign values that doesn't change between lines..
            // Assign rowNumberColumnIndex if row number column exists
        }

        //Assign values that change here, since we currently support only row number
        // i'ts not generic enough        
        if (rowNumberColumnIndex != -1)
        {
            int rowNumber = 1;

            foreach (var baseRow in baseData)
            {
                row[rowNumberColumnIndex] = rowNumber;

                Array.Copy(baseRow, 0, row, extraColumnsLength, dataSize);

                yield return row;

                rowNumber++;
            }
        }
        else
        {
            foreach (var baseRow in baseData)
            {
                Array.Copy(baseRow, 0, row, extraColumnsLength, dataSize);

                yield return row;
            }
        }
    }

this method can be called from hundreds of threads with relatively big data sets so performance here is critical, and i tried to create as minimum new objects as possible.

Please note - this is a private method, which used ONLY BY DataReader, which read each line, and passes it to another array immediately prior to reading the next line.

So - does copying arrays here be optimized here somehow, and should i use (carefully) memory to boost things here?

Thanks

Foi útil?

Solução

Your code is fundamentally broken. You're just returning a reference to the same array every time, which means that unless the caller uses the data within each item immediately, it effectively gets lost. For example, suppose I use:

List<object[]> rows = BindExtraColumns(data, size, toAdd).ToList();

Then when I iterate over the rows, I find the same data in every row. That's really not a good experience.

I think it would make much more sense to create a new array for each iteration. Yes, that's a lot of extra memory being used - but it doesn't surprise callers nearly as much.

If you really don't want to do that, I suggest you change the approach so that the caller has to pass in an Action<object[]> to be executed on each row, with the documented proviso that if the caller stashes a reference to the array, they may well be surprised by the results.

You're obviously very concerned about performance, but if your data is coming from a database I'd expect the array creation/copying performance to be insignificant. You should write the simplest (and most reliable) code that works first, and then benchmark it to see whether it performs well enough. Unless you've got evidence that you need to make this surprising design choice, it feels like you're optimizing way too early.

EDIT: Now we know that it's a private method only used in one specific place, I would still avoid this reuse. It's simply fragile. I really would change to passing in an Action<object[]> or simply copying the data to a new array every time. I certainly wouldn't keep the current approach without strong evidence that it's a bottleneck: as I said before, I'd expect the database communication to be much more important. Leaving timebombs in your code like this very rarely works out well.

If you really, really want to keep doing this, you should document it very strongly, giving severe warnings that the result is non-idiomatic.

In terms of whether there's more optimization you could do - well... one alternative would be to avoid having to work with a single array in the first place. You could create a class which held references to both arrays (the current base row and the fixed data) and exposed an indexer which returned the value from one array or the other based on which index was being requested. We don't know what you're doing with the data ("passes it to another array" doesn't really mean anything) so we don't know whether that's feasible, but it would be efficient and could be implemented without the odd behaviour.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top