Question

I have a column family called Emails and i am saving mails into this CF, it is taking 100+seconds to write 5000 mails .

I am using i3 processor, 8gb ram . My data center has 6 nodes with replication factor = 2.

Does the size of the data what we store into the Cassandra affects the performance ? What are all the factors that affects write performance and how do i increase the performance ?

Thanks in advance..

Était-ce utile?

La solution

Some of factors you are asking about are:

  • connection speed and latency between the client and the cluster, and between machines in the cluster (as mentioned by @omnibear)
  • replication factor you are using - if you insert emails one after another replication factor may affect the latency of the single operation, which will result in increased total time; I mean - you may consider batching write operations.
  • you've written that you use i3/8gb - is it a configuration of the client or server machines? configuration of the server machines, especially the amount of memory and other processes that are running on them obviously may affect the performance
  • commit log and data files location - it is recommended to place the commit log on a separate physical disk than data files
  • compaction strategy - I bet it does not matter in your case, but in general it also affects the performance of writes; Cassandra firstly writes data to the memtable and commit log, then commit logs are flushed to sstables, and finally sstables are merged (which is called compaction); the parameters of this process can be tuned to improve performance in particular use cases; you may read about the write path in C* here
  • you can also browse great DataStax documentation notes regarding performance: (http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_throughput_c.html), (http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html) and (http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html)

As an aside, maybe you should consider increasing replication factor to 3, because rf=2 will not give you much - if you use consistency level = quorum, and one node fails, you will not be able to use your cluster; if you decide to use rf=3 with cl=quorum, you still have to read/write to 2 nodes if you want to achieve strong consistency, but in addition, loosing a node will not make the cluster unavailable.

Autres conseils

First use cassandra http://www.datastax.com/products/datastax-enterprise-visual-admin to find out time taken by Cassandra

You can also use

./nodetool cfstats

to collect the statistics on each keyspace and tables within.

It seems to me that your writer is slow as pointed out by others.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top