Question

I need to modify the lucene analyzer for it to be able to recognize the word "Ben" (Dutch stop word). Kindly guide me further. How do I make Lucene Analyzer accept this word as a regular word?

Repository.xml for Server

<paramname="analyzer"value="org.hippoecm.repository.query.lucene.StandardHippoAnalyzer"/>

workspace.xml

<?xmlversion="1.0"encoding="UTF-8"?>
<Workspacename="default">
    <!--
        virtual file system of the workspace:
        class: FQN of class implementing the FileSystem interface
    -->
    <FileSystemclass="org.apache.jackrabbit.core.fs.mem.MemoryFileSystem">
    </FileSystem>
    <!--
        persistence manager of the workspace:
        class: FQN of class implementing the PersistenceManager interface
    -->
    <PersistenceManagerclass="org.apache.jackrabbit.core.persistence.mem.InMemPersistenceManager">
    </PersistenceManager>
    <!--
        Search index and the file system it uses.
        class: FQN of class implementing the QueryHandler interface
    -->
    <SearchIndexclass="org.apache.jackrabbit.core.query.lucene.SearchIndex">
      <paramname="path"value="${wsp.home}/index"/>
    </SearchIndex>
</Workspace>
Was it helpful?

Solution

The most simple approach would be to:

Copy the following class into your local project

http://svn.onehippo.org/repos/hippo/hippo-cms7/repository/tags/hippo-repository-2.24.02/engine/src/main/java/org/hippoecm/repository/query/lucene/StandardHippoAnalyzer.java

Change the Java package and file name.

Remove the stopwords(see the above Java code) that might affect your issue.

Update your repository.xml to use the Analyser with the new package and class name

Remove you existing lucene index and restart Hippo.

$ mvn clean package && mvn -Pcargo.run

That should do it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top