What do you think is making this C++ code slow? (It loops through an ADODB recordset, converts COM types to strings, and fills an ostringstream)

StackOverflow https://stackoverflow.com/questions/271204

  •  06-07-2019
  •  | 
  •  

Question

This loop is slower than I would expect, and I'm not sure where yet. See anything?

I'm reading an Accces DB, using client-side cursors. When I have 127,000 rows with 20 columns, this loop takes about 10 seconds. The 20 columns are string, int, and date types. All the types get converted to ANSI strings before they are put into the ostringstream buffer.

void LoadRecordsetIntoStream(_RecordsetPtr& pRs, ostringstream& ostrm)
{
    ADODB::FieldsPtr pFields = pRs->Fields;
    char buf[80];
    ::SYSTEMTIME sysTime;
    _variant_t var;

    while(!pRs->EndOfFile) // loop through rows
    {
        for (long i = 0L; i < nColumns; i++)  // loop through columns
        {

            var = pFields->GetItem(i)->GetValue();

            if (V_VT(&var) == VT_BSTR)
            {
                ostrm << (const char*) (_bstr_t) var;   
            }
            else if (V_VT(&var) == VT_I4
            || V_VT(&var) == VT_UI1
            || V_VT(&var) == VT_I2
            || V_VT(&var) == VT_BOOL)
            {
                ostrm << itoa(((int)var),buf,10);
            }
            else if (V_VT(&var) == VT_DATE)
            {
                ::VariantTimeToSystemTime(var,&sysTime);
                _stprintf(buf, _T("%4d-%02d-%02d %02d:%02d:%02d"),
                sysTime.wYear, sysTime.wMonth, sysTime.wDay, 
                sysTime.wHour, sysTime.wMinute, sysTime.wSecond);

                ostrm << buf;
            }
        }

        pRs->MoveNext();
    }
}

EDIT: After more experimentation...

I know now that about half the time is used by this line:
var = pFields->GetItem(i)->GetValue();

If I bypass the Microsoft generated COM wrappers, will my code be faster? My guess is no.

The othe half of the time is spent in the statements which convert data and stream it into the ostringstream.

I don't know right now as I write this whether it's the conversions or the streaming that is taking more time.

Would it be faster if I didn't use ostringstream and instead managed my own buffer, with my own logic to grow the buffer (re-alloc, copy, delete)? Would it be faster if my logic made a pessimistic guesstimate and reserved a lot of space for the ostringstream buffer up front? These might be experiments worth trying.

Finally, the conversions themselves. None of the three stand out in my timings as being bad. One answer says that my itoa might be slower than an alternative. Worth checking out.

Was it helpful?

Solution

Try commenting out the code in the for loop and comparing the time. Once you have a reading, start uncommenting various sections until you hit the bottle-neck.

OTHER TIPS

I can't tell from looking at your code, someone more familiar with COM/ATL may have a better answer.

By trial n error I would find the slow code by commenting out inner loop operations out until you see perf spike, then you have your culprit and should focus on that.

I assume V_VT is a function - if so, then for each date value, V_VT(&var) is called 6 times. A simple optimisation is to locally store the value of V_VT(&var) to save up to save up to 5 calls to this function each time around the loop.

If you haven't already done so, re-order the if tests for the types to put the most common column types first - this reduces the number of tests required.

A good part of it is that Access is not a server database - all the file reads/writes, locking, cursor handling, etc. is taking place within the client application (across a network, right?) And needs to be so, if other users have the database open at the same time.

If not, you probably would be able to drop the cursor settings, and open the database read-only.

As a basic idea you should try to see the speed of the code when you have only VT_BSTR conversion, after that with VT_DATE and at last with the other types, see which is taking the most time.

The only observation I have is that itoa is not standard C. The implementation may be very slow as you can see from this article.

Try profiling. If you don't have a profiler a simple way could be to wrap all calls in your loop you think may take some time with something like the following:

#define TIME_CALL(x) \
do { \
  const DWORD t1 = timeGetTime();\
  x;\
  const DWORD t2 = timeGetTime();\
  std::cout << "Call to '" << #x << "' took " << (t2 - t1) << " ms.\n";\
}while(false)

So now you can say:

TIME_CALL(var = pFields->GetItem(i)->GetValue());
TIME_CALL(ostrm << (const char*) (_bstr_t) var);

and so on...

You don't need the itoa - you're writing to a stream.

To answer your new question I think you should use the fact that you can let the stream format your data instead of formatting it into a string and then pass that string to the stream, e.g.:

_stprintf(buf, _T("%4d-%02d-%02d %02d:%02d:%02d"),
                  sysTime.wYear, sysTime.wMonth, sysTime.wDay, 
                  sysTime.wHour, sysTime.wMinute, sysTime.wSecond);

ostrm << buf;

Turns into:

ostrm.fill('0');
ostrm.width(4);
ostrm << sysTime.wYear << _T("-");
ostrm.width(2);
ostrm << sysTime.wMonth;

And so on...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top