Question

I need to convert several million dates stored as wide strings into boost dates

The following code works. However, it generates a horrible compiler warning and does not seem efficient.

Is there a better way?

#include "boost/date_time/gregorian/gregorian.hpp"
using namespace boost::gregorian;

#include <string>
using namespace std;


    wstring ws( L"2008/01/01" );

    string temp(ws.length(), '\0');
    copy(ws.begin(), ws.end(), temp.begin());
    date d1( from_simple_string( temp ) );

    cout << d1;

The better way turns out to be to use the standard C++ library locale, which is a collection of facets. A facet is a service which allows the stream operators to handle a particular choice for date or time representation or just about anything else. All the choices about diferent things, each handled by its own facet, are gathered together in a locale.

This solution was pointed out to me by litb who gave me enough help to use facets in my production code, making it terser and faster. Thanks.

There is an excellent tutorial on locales and facets by Nathan Myers who designed facets. He has a light style which makes his tutorial easy to read, though this is advanced stuff and your brain may hurt after the first read through - mine did. I suggest you go there now. For anyone who just wants the practicalities of converting wide character strings to boost dates, the rest of this post describes the minimum necessary to make it work.


litb first offered the following simple solution that removes the compiler warning. ( The solution was edited before I got around to accepting it. ) This looks like it does the same thing, converting wide characters one by one, but it avoids mucking around with temp strings and therefore is much clearer, I think. I really like that the compiler warning is gone.

#include "boost/date_time/gregorian/gregorian.hpp"
using namespace boost::gregorian;

#include <string>
using namespace std;


    wstring ws( L"2008/01/01" );

    date d1( from_simple_string( string( ws.begin(), ws.end() ) );

    cout << d1;

litb went on to suggest using "facets", which I had never heard of before. They seem to do the job, producing incredibly terse code inside the loop, at the cost of a prologue where the locale is set up.

wstring ws( L"2008/01/01" );

// construct a locale to collect all the particulars of the 'greek' style
locale greek_locale;
// construct a facet to handle greek dates - wide characters in 2008/Dec/31 format
wdate_input_facet greek_date_facet(L"%Y/%m/%d");
// add facet to locale
greek_locale = locale( greek_locale, &greek_date_facet );
// construct stringstream to use greek locale
std::wstringstream greek_ss; 
greek_ss.imbue( greek_locale );

date d2;

greek_ss << ws;
greek_ss >> d2;

cout << d2;

This, it turns out, is also more efficient:

clock_t start, finish;
double  duration;

start = clock();
for( int k = 0; k < 100000; k++ ) {
    string temp(ws.length(), '\0');
    copy(ws.begin(), ws.end(), temp.begin());
    date d1( from_simple_string( temp ) );
}
finish = clock();
duration = (double)(finish - start) / CLOCKS_PER_SEC;
cout << "1st method: " << duration << endl;

start = clock();
for( int k = 0; k < 100000; k++ ) {
    date d1( from_simple_string( string( ws.begin(), ws.end() ) ) );
}
finish = clock();
duration = (double)(finish - start) / CLOCKS_PER_SEC;
cout << "2nd method: " << duration << endl;

start = clock();
for( int k = 0; k < 100000; k++ ) {
    greek_ss << ws;
    greek_ss >> d2;
    ss.clear();
}
finish = clock();
duration = (double)(finish - start) / CLOCKS_PER_SEC;
cout << "3rd method: " << duration << endl;

Produces the following output:

1st method: 2.453
2nd method: 2.422
3rd method: 1.968

OK, this is now in the production code and passing regression tests. It looks like this:

  //  .. construct greek locale and stringstream 

  // ... loop over input extracting date strings

        // convert range to boost dates
        date d1;
        greek_ss<< sd1; greek_ss >> d1;
        if( greek_ss.fail() ) {
                       // input is garbled
            wcout << L"do not understand " << sl << endl;
            exit(1);
        }
         greek_ss.clear();

// finish processing and end loop

I have one final question about this. Adding the facet to the locale seems to require two invocations of the locale copy constructor

    // add facet to locale
greek_locale = locale( greek_locale, &greek_date_facet );

Why is there not an add( facet* ) method? ( _Addfac() is complex, undocumented and deprecated )

Was it helpful?

Solution

efotinis found a good way using from_stream .


I've looked into the manual of date_time and found it supports facets:

#include <boost/date_time/gregorian/gregorian.hpp>
#include <iostream>
#include <sstream>
#include <locale>

int main() {
    using namespace boost::gregorian;

    std::wstringstream ss;
    wdate_input_facet * fac = new wdate_input_facet(L"%Y-%m-%d");
    ss.imbue(std::locale(std::locale::classic(), fac));

    date d;
    ss << L"2004-01-01 2005-01-01 2006-06-06";
    while(ss >> d) {
        std::cout << d << std::endl;
    }
}

You could also go with that.


I've looked up how date facets work:

  • The boost::date_time::date_input_facet template implements a facet.
  • Facets are derived from std::locale::facet and every one has an unique id.
  • You can imbue a new locale into a stream, replacing its old locale. The locale of a stream will be used for all sorts of parsing and conversions.
  • When you create a new std::locale using the form i showed, you give it an existing locale, and a pointer to facet. The given facet will replace any existing facet of the same type in the locale given. (so, it would replace any other date_input_facet used).
  • All facets are associated with the locale somehow, so that you can use std::has_facet<Facet>(some_locale) to check whether the given locale has some given facet type.
  • You can use a facet from one locale by doing std::use_facet<Facet>(some_locale).some_member... .
  • date_input_facet has a function get, which can be used like this:

The below is essentially done by operator>> by boost::date_type :

// assume src is a stream having the wdate_input_facet in its locale. 
// wdate_input_facet is a boost::date_time::date_input_facet<date,wchar_t> typedef.

date d;

// iterate over characters of src
std::istreambuf_iterator<wchar_t> b(src), e;

// use the facet to parse the date
std::use_facet<wdate_input_facet>(src.getloc()).get(b, e, src, d);

OTHER TIPS

You can use the from_stream parser function:

using boost::gregorian::date;
using boost::gregorian::from_stream;

std::wstring ws( L"2008/01/01" );
date d1(from_stream(ws.begin(), ws.end()));
std::cout << d1;  // prints "2008-Jan-01"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top