Question

I'm developing a keyword analytics app. I wish to crawl the web using Nutch, index the output using Solr and finally store the data in Cassandra.

I should later be able to do search queries and analytics on Solr and it must fetch the relevant data from Cassandra.

Is this setup possible? If yes, is there anything that I should keep in mind?

Était-ce utile?

La solution 2

I think you can, but I am not a Cassandra user, so never tried.

You will have to configure gora.properties (http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1/conf/gora.properties) to enable Cassandra. In Nutch 2 Tutorial (http://wiki.apache.org/nutch/Nutch2Tutorial) the do that for HBase.

To know where is the data mapped in Cassandra you will need to take a look at the mappings at http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1/conf/gora-cassandra-mapping.xml

Nutch will store the data in Cassandra. About Solr I don't know (I never used Solr).

Autres conseils

If you use Datastax's Cassandra, indexing Cassandra table(s) into Solr is much easier. Here is a link at http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-solr

Programmatically its possible .... you can get the result from solr indexes ... keep the unique id in both cassandra and Solr ... fetch that id from solr and fetch the entire result from cassandra .....

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top