Question

I want to stem the words in my document and have zeroed in on KStem. I am working in Eclipse and have configured Lucene by downloading the lucene-core jar file to the lib folder and adding it to the build path. I similarly did this for the KStem jar file. However I am unable to find any examples or documentation on how to use the KStem library in my java code. Do I need to setup Solr for that - not entirely sure what that is for.

Was it helpful?

Solution

You just need to configure KStem to be picked up by Lucene / SOLR in your schema (in the field type that will have the contents of your documents) and only send the docs for indexing from your code. The rest of the magic (including tokenization / stemming / stop word removal / etc) happens inside Lucene / SOLR (depending on which processing steps you configure them to apply to your documents). See this how to configure KStem in SOLR: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

In short, Lucene is the powerful indexing and retrieval library, that is used inside SOLR, which, in turn, is the fully-blown search server.

If you need the features, like faceted search, autocomplete, sharding, replication etc, then install and configure Solr. If not, e.g., if your goal is to have some relatively small index of documents, which can be quickly re-built and you would like to organize the search from within your own app, then use Lucene as a java library.

How to use KStemmer inside one's code:

private final KStemmer stemmer = new KStemmer();
// char[] term = ...
stemmer.stem(term, len);

You will have to split your document contents to a list of terms yourself.

This is how KStemmer is used inside Lucene: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemFilter.java

Note, that KStemmer class is part of Lucene's current trunk (org.apache.lucene.analysis.en): http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemmer.java

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top