Question

I'm using Cassandra with C++ driver and I have observed a very slow performance when coming to insert thousands of rows. I am using a VM Ubuntu 12.04 LTS with 8Go Ram and 4CPUs.

I have got an ascii file which contains 15800 records and I'm trying to read each record and fill my cassandra db table with. The "COPY" command took around 1min 30sec. With querying after reading each record one by one took around 6-7mins and with a batch query it takes for ever (lasted 30 mins then I abandonned!).

I would like to know if there is any solution for such kind of insertion with a faster performance.

Many thanks!

Was it helpful?

Solution 2

I managed to reduce the time lapse to 12.5 secs by slicing the batch. Here is the solution, if it can help others and I would be happy if some one else could provide a better one :)

int nb_lines = 0;
int rest = 0;
string create_query = "BEGIN BATCH ";
std::ifstream file("/media/sf_Shared/xfmge");
for(string line; getline(file, line);){
    stringstream sstm;
    if(nb_lines  == 800 ) {
        nb_lines = 0;
        rest = 0;
        create_query += " APPLY BATCH;";
        boost::shared_ptr<cql::cql_query_t> create(
            new cql::cql_query_t(create_query, cql::CQL_CONSISTENCY_ONE));
        query_result = session->query(create);

        query_result.wait();
        if (query_result.get().error.is_err()) {
            cout << "-isbuild - ERROR for query: " << create_query << endl;
            cout << query_result.get().error.message << endl;
            return iserrno;
        } else {
            cout << "+isbuild - QUERY SUCCESSFUL: " << create_query << endl;
        }
        create_query = "BEGIN BATCH ";
    } else {
        record = (char*)line.c_str();
        sstm << "insert into felder (id, data) values ('felder', '" << record << "') ";
        create_query += sstm.str();
        rest = 1;
    }
    nb_lines ++;
}
if(rest == 1){
    create_query += " APPLY BATCH";
    boost::shared_ptr<cql::cql_query_t> create(
        new cql::cql_query_t(create_query, cql::CQL_CONSISTENCY_ONE));
    query_result = session->query(create);

    query_result.wait();
    if (query_result.get().error.is_err()) {
        cout << "-isbuild - ERROR for query: " << create_query << endl;
        cout << query_result.get().error.message << endl;
        return iserrno;
    } else {
        cout << "+isbuild - QUERY SUCCESSFUL: " << create_query << endl;
    }   

}

OTHER TIPS

Here is my source code:

string create_query = "BEGIN BATCH ";
std::ifstream file("/media/sf_Shared/xfmge");
for(string line; getline(file, line);){
    stringstream sstm;
    record = (char*)line.c_str();
    sstm << "insert into felder (id, data) values ('felder', '" << record << "') ";
    create_query += sstm.str();
}
create_query += " APPLY BATCH;";

boost::shared_ptr<cql::cql_query_t> create(
        new cql::cql_query_t(create_query, cql::CQL_CONSISTENCY_ONE));
query_result = session->query(create);
cout << "sending..." << endl;
query_result.wait();
if (query_result.get().error.is_err()) {
    cout << "-isbuild - ERROR in query: " << create_query << endl;
    cout << query_result.get().error.message << endl;
    return iserrno;
} else {
    cout << "+isbuild - QUERY SUCCESSFUL: " << create_query << endl;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top