I am wondering about the state of connection and impact on code performance by 'yield' while iterating over data reader object

StackOverflow https://stackoverflow.com/questions/15381708

Here is my sample code that I am using to fetch data from database: on DAO layer:

public IEnumerable<IDataRecord> GetDATA(ICommonSearchCriteriaDto commonSearchCriteriaDto)
{
    using(DbContext)
    {
        DbDataReader reader = DbContext.GetReader("ABC_PACKAGE.GET_DATA", oracleParams.ToArray(), CommandType.StoredProcedure);
        while (reader.Read())
        {
            yield return reader;
        }
    }
}

On BO layer I am calling the above method like:

List<IGridDataDto> GridDataDtos = MapMultiple(_costDriversGraphDao.GetGraphData(commonSearchCriteriaDto)).ToList();

on mapper layer MapMultiple method is defined like:

public IGridDataDto MapSingle(IDataRecord dataRecord)
{
    return new GridDataDto
    {
        Code = Convert.ToString(dataRecord["Code"]),
        Name = Convert.ToString(dataRecord["Name"]),
        Type = Convert.ToString(dataRecord["Type"])     
    };
}
public IEnumerable<IGridDataDto> MapMultiple(IEnumerable<IDataRecord> dataRecords)
{
    return dataRecords.Select(MapSingle);
}

The above code is working well and good but I am wondering about two concerns with the above code.

  1. How long data reader’s connection will be opened?
  2. When I consider code performance factor only, Is this a good idea to use ‘yield return’ instead of adding record into a list and returning the whole list?
有帮助吗?

解决方案

  1. your code doesn't show where you open/close the connection; but the reader here will actually only be open while you are iterating the data. Deferred execution, etc. The only bit of your code that does this is the .ToList(), so it'll be fine. In the more general case, yes: the reader will be open for the amount of time you take to iterate it; if you do a .ToList() that will be minimal; if you do a foreach and (for every item) make an external http request and wait 20 seconds, then yes - it will be open for longer.
  2. Both have their uses; the non-buffered approach is great for huge results that you want to process as a stream, without ever having to load them into a single in-memory list (or even have all of them in memory at a time); returning a list keeps the connection closed quickly, and makes it easy to avoid accidentally using the connection while it already has an open reader, but is not ideal for large results

If you return an iterator block, the caller can decide what is sane; if you always return a list, they don't have much option. A third way (that we do in dapper) is to make the choice theirs; we have an optional bool parameter which defaults to "return a list", but which the caller can change to indicate "return an iterator block"; basically:

bool buffered = true

in the parameters, and:

var data = QueryInternal<T>(...blah...);
return buffered ? data.ToList() : data;

in the implementation. In most cases, returning a list is perfectly reasonable and avoids a lot of problems, hence we make that the default.

其他提示

How long data reader’s connection will be opened?

The connection will remain open until the reader is dismissed, which means that it would be open until the iteration is over.

When I consider code performance factor only, Is this a good idea to use yield return instead of adding record into a list and returning the whole list?

This depends on several factors:

  • If you are not planning to fetch the entire result, yield return will help you save on the amount of data transferred on the network
  • If you are not planning to convert returned data to objects, or if multiple rows are used to create a single object, yield return will help you save on the memory used at the peak usage point of your program
  • If you plan to iterate the enture result set over a short period of time, there will be no performance penalties for using yield return. If the iteration is going to last for a significant amount of time on multiple concurrent threads, the number of open cursors on the RDBMS side may become exceeded.

This answer ignores flaws in the shown implementation and covers the general idea.

It is a tradeoff - it is impossible to tell whether it is a good idea without knowing the constraints of your system - what is the amount of data you expect to get, the memory consumption you are willing to accept, expected load on the database, etc

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top