Domanda

I would like to know the performance difference for Cassandra's secondary index vs. DSE's solr indexing placed on CF's.

We have a few CF's that we did not place secondary indices on because we were under the impression that secondary indices would (eventually) cause significant performance issues for heavy read/write CF's. We are trying to turn to Solr to allow for searching these CF's but it looks like loading an index schema modifies the CF's to have secondary indices on the columns of interest.

Would like to know if Solr indexing is different than Cassandra's secondary indexing? And, will it eventually cause slow queries (inserts/reads) for CFs w/ large data sets and heavy read/writes? If so, would you advise custom indexing (which we wanted to avoid)? Btw -- we're also using (trying to use) Solr for its spatial searching.

Thanks for any advice/links you can give.


UPDATE: To better understand why I’m asking these questions and to see if I am asking the right question(s) – description of our use case:

We’re collecting sensor events – many! We are storing them in both a time series CF (EventTL) and skinny CF (Event). Because we are writing (inserting and updating) heavily in the Event CF, we are not placing any secondary indices. Our queries right now are limited to single events via Event or time range of events through EventTL (unless we create additional fat CF’s to allow range queries on other properties of the events).

That’s where DSE (Solr+Cassandra) might help us. We thought that leveraging Solr searching would allow us to avoid creating extra fat CF’s to allow searches on other properties of the events AND allow us to search on multiple properties at once (location + text/properties). However, looking at how the definition of the Event CF changes after adding an index schema for Event via Solr shows that secondary indices were created. This leads to the question of whether these indices will create issues for inserting/updating rows in Event (eventually). We require being able to insert new events ‘quickly’ – because events can potentially come in at 1000+ per sec.

È stato utile?

Soluzione 2

Since your use case is spatial search, I don't think Cassandra's secondary index feature will work for you. Here's a fairly concise article on secondary indexes that you may find useful: http://www.datastax.com/docs/1.1/ddl/indexes

You should be able to do this with Solr.

Here's a post that should be relevant for you:

http://digbigdata.com/geospatial-search-cassandra-datastax-enterprise/

Altri suggerimenti

Would like to know if Solr indexing is different than Cassandra's secondary indexing?

DSE Search uses the Cassandra secondary indexing API.

And, will it eventually cause slow queries (inserts/reads) for CFs w/ large data sets and heavy read/writes?

Lucene and Solr capacity planning is a good idea prior to exceeding the optimal performance threshold of a given server cluster.

If so, would you advise custom indexing (which we wanted to avoid)? Btw -- we're also (trying to use) Solr for it's spatial searching.

DSE Search queries are as fast as Apache Solr queries.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top