Cassandra Hector - UnavailableException

https://stackoverflow.com/questions/19044658

29-06-2022
|

Pergunta

I'm trying to insert records using Hector and from time to time I get this error:

me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level.
    at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:59)
    at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:264)
    at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
    at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
    at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeBatch(AbstractColumnFamilyTemplate.java:115)
    at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeIfNotBatched(AbstractColumnFamilyTemplate.java:163)
    at me.prettyprint.cassandra.service.template.ColumnFamilyTemplate.update(ColumnFamilyTemplate.java:69)
    at ustocassandra.USToCassandraHector.consumer(USToCassandraHector.java:271)
    at ustocassandra.USToCassandraHector.access$100(USToCassandraHector.java:41)
    at ustocassandra.USToCassandraHector$2.run(USToCassandraHector.java:71)
    at java.lang.Thread.run(Thread.java:724)
Caused by: UnavailableException()
    at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
    at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
    at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
    at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
    at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:104)
    at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
    ... 9 more

I know the usual explanation is that there are not enough nodes up, but it's not this case. All my nodes are up:

./nodetool ring
Note: Ownership information does not include topology; for complete information, specify a keyspace

Datacenter: DC1
==========
Address         Rack        Status State   Load            Owns                Token
                                                                               4611686018427388000
172.16.217.222  RAC1        Up     Normal  353.36 MB       25.00%              -9223372036854775808
172.16.217.223  RAC2        Up     Normal  180.84 MB       25.00%              -4611686018427388000
172.16.217.224  RAC3        Up     Normal  260.34 MB       25.00%              -2
172.16.217.225  RAC4        Up     Normal  222.71 MB       25.00%              4611686018427388000

I'm inserting records with 20 threads (maybe I should use less? From what I know, the error would be Overloaded in this case, not Unavailable). I'm using a write consistency of ONE. I'm using AutoDiscoveryAtStartup and LeastActiveBalancingPolicy. The replication factor is 2.

I'm using Cassandra 1.2.8 (I tried with 2.0 and it's the same).

The error doesn't occur from the beginning. I ususally manage to insert about 2 million records before getting the error. My code is set to retry when an error occurs. After some dozens of retries, the insert usually succeeds. After that, it again works fine for some millions of inserts then I get the error again and the cycle continues.

Could it be because I set gc_grace = 60? Anyway, I don't get the error every 60 seconds so I don't think this is the reason.

Could you give me some suggestions about the reason of this error and what should I do?

EDIT:

'nodetool tpstats' says I have some messages dropped:

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
BINARY                       0
READ                         0
MUTATION                    11
_TRACE                       0

And I see the following warnings in the log file:

 WARN [ScheduledTasks:1] 2013-09-30 09:20:16,633 GCInspector.java (line 136) Heap is 0.853986836999536 full.  You may need to reduce memtable and/or cache sizes.  Cassandra is now reducing cache sizes to free up memory.  Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-09-30 09:20:16,634 AutoSavingCache.java (line 185) Reducing KeyCache capacity from 1073741824 to 724 to reduce memory pressure
 WARN [ScheduledTasks:1] 2013-09-30 09:20:16,634 GCInspector.java (line 142) Heap is 0.853986836999536 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-09-30 09:20:16,634 StorageService.java (line 3618) Flushing CFS(Keyspace='us', ColumnFamily='my_cf') to relieve memory pressure

This is at the exact time when Hector throws the Unavailable exception. So, it's probably a memory related problem. I guess I will try what the warning says: reducing the memtable size.

Solução

It's probably because your servers are overloaded so some nodes are not responding. There is no OverloadedException (an overloaded node looks just like an unavailable node).

You should check your Cassandra logs - are there warnings about the heap being full? Are there dropped messages listed in nodetool tpstats? What is the CPU load on your servers?

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow