Question

The setup:

  • 2-node Cassandra 1.2.6 cluster
  • replicas=2
  • very large CQL3 table with no secondary index
  • Rowkey is a UUID.randomUUID().toString()
  • read consistency set to ONE
  • Using DataStax java driver 1.0

The request:

Attempting to do a table scan by "SELECT some-col from schema.table LIMIT nnn;"

The fail:

Once I go beyond a certain nnn LIMIT, I start to get NoHostAvailableExceptions from the driver.

It reads like this:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.181.13.239 ([/10.181.13.239] Unexpected exception triggered))
            at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64)
            at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214)
            at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169)
            at com.jpmc.es.rtm.storage.impl.EventExtract.main(EventExtract.java:36)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:601)
            at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.181.13.239 ([/10.181.13.239] Unexpected exception triggered))
            at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:98)
            at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:165)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

Given: This is probably not the most enlightened thing to do to a large table with millions of rows, but this is how I learn what not to do, so I would really appreciate someone who could volunteer how this kind of error can be debugged.

For example, when this happens, there are no indications that the nodes in the cluster ever had an issue with the request (there is nothing in the logs on either node that indicate any timeout or failure). Also, I enabled the trace on the driver, which gives you some nice autotrace (ala Oracle) info as long as the query succeeds. But in this case, the driver blows a NoHostAvailableException and no ExecutionInfo is available, so tracing has not provided any benefit in this case.

I also find it interesting that this does not seem to be recorded as a timeout (my JMX consoles tell me no timeouts have occurred). So, I am left not understanding WHERE the failure is actually occurring. I am left with the idea that it is the driver that is having a problem, but I don't know how to debug it (and I would really like to).

I have read several posts from folks that state that query'g for resultSets > 10000 rows is probably not a good idea, and I am willing to accept this, but I would like to understand what is causing the exception and where the exception is happening.

FWIW, I also tried bumping the timeout properties in the cassandra.yaml, but this made no difference whatsoever.

I welcome any suggestions, anecdotes, insults, or monetary contributions for my registration in the house of moron-developers.

Regards!!

Was it helpful?

Solution

My guess (and perhaps others can confirm) is that you are putting too high a load on the cluster by the query which is causing the timeout. So, yes, it's a little difficult to debug as it's not obvious what the root cause was: was the limit I set too large or is the cluster actually down?

You want to avoid setting large limits on the amount of data you request in a single query, typically by setting a reasonable limit and paging through the results, e.g.,

SELECT * FROM messages WHERE user_id = 101 LIMIT 1000;
SELECT * FROM messages WHERE user_id = 101 AND msg_id > [Last message ID received] LIMIT 1000;

The Automatic Paging functionality added in (see this document, where the code examples in this answer are copied from) is a big improvement in datastax java-driver as it removes the need to manually page and lets you do the following:

Statement stmt = new SimpleStatement("SELECT * FROM images");
stmt.setFetchSize(100);
ResultSet rs = session.execute(stmt);

// Iterate over the ResultSet here

While this won't necessarily solve your problem it will minimise the possibility that it was a "too-big" query.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top