Question

I am using Cassandra 2.0.3 and I drop and recreate a simple table via cqlsh by loading a file (source command). In the same file, I insert some rows in the newly created table.

About once every 3-4 tries, I get rpc_timeout on some of the inserts. When this is the case, I always have this exception on one node of the cluster:

 WARN [Thread-63] 2014-05-07 10:52:39,658 IncomingTcpConnection.java (line 83) UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=15a8520e-bb08-3a79-82a0-f735287315bf
    at org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:178)
    at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:103)
    at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserializeOneCf(RowMutation.java:304)
    at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:284)
    at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:312)
    at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:254)
    at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
    at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153)
    at org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130)
    at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)

Even if I do the INSERT direcly in cqlsh, it fails also with rpc_timeout. Usually after about one minute, the insert is successful.

My nodes are time synchronized (I use 3 VMs on my PC) and the LAN is of course super fast on all VMs are running locally.

I created the cluster by adding 2 nodes to an existing Cassandra running on a single node. My keyspace is not using replication:

CREATE KEYSPACE eras
  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

Here is the content of the file I use to reproduce the problem:

DROP TABLE IF EXISTS erasconfig;

CREATE TABLE erasconfig (
  name text,
  category text,
  description text,
  ismodifiablebyuser int,
  value text,
  format text,
  PRIMARY KEY (name, category)
);

INSERT INTO ErasConfig (isModifiableByUser, format, name, value, category, description) VALUES (1, '', 'RECORD_IN_BASE', 'garbage', 'Path', 'Absolute path used for RECORD INPUT files');

This INSERT goes into the 3rd node of the cluster which is the one failing sometimes during table creation with the exception above.

Was it helpful?

Solution

The issue is that schema replication is asynchronous to the create finishing. So in a multinode cluster you need to verify that the schema changes have propagated to all the nodes before you try to use them. The nodetool describecluster can be used to check if the schemas agree. From a client you can check the system.peers table to verify that all the schema versions have updated.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top