Question

When retrieving values from a DataRow is it better to use the column name or column index?

The column name is more readable and easier to maintain:

int price = (int)dr["Price"];

While column index is just faster (I think):

int price = (int)dr[3];

Would using column names break if you decide to obfuscate the database?

Was it helpful?

Solution

I generally prefer readability and understanding over speed. Go with the name. You could (should) use string constants that can be updated in one place if you decide to change database column names.

OTHER TIPS

Accessing columns/row values via column names is better for human-reading and for forward-compatibility (if in future someone change order or count of columns).

Accissing columns/row values via column indeces is better for performance.

So, if you want change some value in one/two/..... rows, the column names are ok. But if you want change some value in thousands of rows, you should use column index computed from column name:

int ndxMyColumn = table.Columns.IndexOf( "MyColumn" );
foreach(DataRow record in table.Rows ) {
    record[ndxMyColumn] = 15;
}

Completely agress with others re. go for readability and maintainability over speed. I however had a generic method which needed to get named columns passed in as parameters so it made sense to work out what there column indices were.

In the benchmarking below using column index showed a big improvement so if this is a bottleneck area or a performance critical part of your code it may be worthwhile.

The output from the code below is:

515ms with ColumnIndex

1031ms with ColumnName

    static void Main(string[] args)
    {            
        DataTable dt = GetDataTable(10000, 500);
        string[] columnNames = GetColumnNames(dt);

        DateTime start = DateTime.Now;
        TestPerformance(dt, columnNames, true);

        TimeSpan ts = DateTime.Now.Subtract(start);
        Console.Write("{0}ms with ColumnIndex\r\n", ts.TotalMilliseconds);

        start = DateTime.Now;
        TestPerformance(dt, columnNames, false);
        ts = DateTime.Now.Subtract(start);
        Console.Write("{0}ms with ColumnName\r\n", ts.TotalMilliseconds);
    }

    private static DataTable GetDataTable(int rows, int columns)
    {
        DataTable dt = new DataTable();

        for (int j = 0; j < columns; j++)
        {
            dt.Columns.Add("Column" + j.ToString(), typeof(Double));
        }

        Random random = new Random(DateTime.Now.Millisecond);
        for (int i = 0; i < rows; i++)
        {
            object[] rowValues = new object[columns];

            for (int j = 0; j < columns; j++)
            {
                rowValues[j] = random.NextDouble();
            }

            dt.Rows.Add(rowValues);
        }

        return dt;
    }

    private static void TestPerformance(DataTable dt, string[] columnNames, bool useIndex)
    {
        object obj;
        DataRow row;

        for (int i =0; i < dt.Rows.Count; i++)
        {
            row = dt.Rows[i];

            for(int j = 0; j < dt.Columns.Count; j++)
            {
                if (useIndex)
                    obj = row[j];
                else
                    obj = row[columnNames[j]];
            }
        }
    }

    private static string[] GetColumnNames(DataTable dt)
    {
        string[] columnNames = new string[dt.Columns.Count];

        for (int j = 0; j < columnNames.Length; j++)
        {
            columnNames[j] = dt.Columns[j].ColumnName;
        }

        return columnNames;
    }

I would think the column name is the best way to go. It is easier to determine what you are pulling, and the column order is determined by the select statement which could change sometime down the road. You could argue the column name could change too, but i would think this would be much less likely.

EDIT:

Actually if you were really bent on using column indexes you could create constants of the column indexes and name the constant the name of the column. So:

PRIMARY_KEY_COLUMN_NAME_INDEX = 0

That would at least make it readable.

It depends on what you need. In my case, I had a situation where speed was paramount as I was performing intense processing on thousands of rows in a DataSet, so I chose to write a piece of code that cached the column indexes by name. Then, in the loop code I used the cached indexes. This gave a reasonable performance increase over using the column name directly.

Your mileage may vary, of course. My situation was a rather contrived and unusual case, but in that instance it worked rather well.

My opinion is that you should only switch to indices if you profiled your code and it showed as the bottleneck. I don't think this will happen.

Naming stuff is good, it makes our limited brain understand problems and build links easier. That's why we are given names such as Fred, Martin, Jamie, rather than Human[189333847], Human[138924342] and Human[239333546].

If you did decide to obfuscate the database by changing column names in the future, you could alias those columns in your query to keep the indexer code functional. I suggest indexing by name.

Go with the name, you get better error messages :)

I opt for strings for ease of reading and maintainability. I use string contstants to define the values of the column names. Ex:

public class ExampleDataColumns
{
    public const string ID = "example_id";
    public const string Name = "example_name";
    ....    
}

Then I can reference it later like this:

row[ExampleDataColumns.ID]

Use column names for DataRow by the same token that an RDBMS won't gain speed by requiring programmers to specify the column index in SQL. But you can perhaps mimic the way an RDBMS operate when you issue a SELECT statement, inside an RDBMS engine it query the column index/offset of columns specified in SELECT clause before it traverse the rows, so it can operate faster.

If you really want to gain speed, don't do it the const/enum way (column order might change on your database or ORM layer). Do it as TcKs suggested(before the actual loop):

int ndxMyColumn = table.Columns.IndexOf( "MyColumn" );
foreach(DataRow record in table.Rows ) {
    record[ndxMyColumn] = 15;
}

for me, I'm using reflection(not sure it's the correct way to name what I do) to get the columnnameColumn from the table

no "hardcoding" is better

  int price = (int)dr[DatableVar.PriceColumn];
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top