Creating a Lucene index for an existing Apache Jena TDB to implement text search

https://stackoverflow.com/questions/17954399

04-06-2022
|

Frage

I have a large Apache Jena TDB, I want to build a Lucene index using Apache Jena 2.10.2 for use with the new text search feature. I find the documentation hard to follow.

I first tried to use configuration in code, but had trouble with the dependencies. Any combination of lecene-core and solr-solrj would either result in certain 'classNotFound' errors or a 'StandardAnalyzer overrides final method tokenStream' error. Example of Code:

Dataset ds1 = DatasetFactory.createMem() ;

EntityDefinition entDef = new EntityDefinition("uri", "text", RDFS.label) ;

Directory dir =  new RAMDirectory();

// Have also tried creating the index in a file
File indexDir = new File("luceneIndexes");
Directory dir = FSDirectory.open(indexDir);

// Fails on this line
Dataset ds = TextDatasetFactory.createLucene(ds1, dir, entDef) ;

I think the only solution may be to create an Text Dataset Assembler, but if anyone has advice on creating this in code I would prefer to do it that way.

Lösung

The example is exactly the one from Jena, which does work.

It looks like you have a confusion of jar versions. Have you tried using maven to resolve the dependencies? Looking at "mvn dependency:tree" shows you what versions are used.

jena-text is built for Lucene 4.3.1 or Solr 4.3.1.

See the POM from: https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-text/1.0.0-SNAPSHOT/

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow