I'm developing a keyword analytics app. I wish to crawl the web using Nutch, index the output using Solr and finally store the data in Cassandra.

I should later be able to do search queries and analytics on Solr and it must fetch the relevant data from Cassandra.

Is this setup possible? If yes, is there anything that I should keep in mind?

有帮助吗?

解决方案 2

I think you can, but I am not a Cassandra user, so never tried.

You will have to configure gora.properties (http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1/conf/gora.properties) to enable Cassandra. In Nutch 2 Tutorial (http://wiki.apache.org/nutch/Nutch2Tutorial) the do that for HBase.

To know where is the data mapped in Cassandra you will need to take a look at the mappings at http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1/conf/gora-cassandra-mapping.xml

Nutch will store the data in Cassandra. About Solr I don't know (I never used Solr).

其他提示

If you use Datastax's Cassandra, indexing Cassandra table(s) into Solr is much easier. Here is a link at http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-solr

Programmatically its possible .... you can get the result from solr indexes ... keep the unique id in both cassandra and Solr ... fetch that id from solr and fetch the entire result from cassandra .....

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top