Question

Is there a Solr/Lucene filter for analyzing text in Latin (the language, not the script type)? They exist for many other languages (Italian, Czech, etc.) but Latin isn't included in the Solr distribution by default.

This makes sense, of course (no one speaks Latin any more...), but I'm hoping to find one. Perhaps there's a list of plugins somewhere I could see. It's difficult to search for because all of the results are just for Latin encoding blocks.

No correct solution

OTHER TIPS

Unless you need stemming features, StandardAnalyzer should be a reasonable starting point at least, though the default stop word set would not be particularly useful.

If you are looking for a stemmer, there is a LatinStemFilter out there as well. You can find it at LUCENE-4229. I don't really know how effective it is at this point, though.

There is an external project that does Latin stemming and Latin number convertion.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top