문제

i have problems with regards to indexing item names with numbers and symbols. a sample of my data is shown below:

ANGLE BARS   ORANGE - 4.0MM 2 - 1/2"
B.I SQUARE TUBING     2" X 3"
B.I. PIPE S-40   10MM 3/8"
B.I SQUARE TUBING     1" X 2"
PLYWOOD   MARINE 3/4X4X8
PLYWOOD   STA. CLARA 1/8X4X8
PLYWOOD   STA. CLARA 3/16X4X8

i want to tokenize my data in white or trailing spaces without dropping the symbols because these symbols are very essential. so that whenever i search for "plywood sta. clara", "b.i square 2" X 3"", or "angle orange 2 - 1/2" will give me a result. i tried to used whitespace analyzer but the symbols are dropped. i also tried standardanalyzer but stop words and symbols are also dropped. what is the best analyzer to use instead?

도움이 되었습니까?

해결책

You can use PatternAnalyzer by writing regular expression or create Custom Analyzer.

다른 팁

Try using a org.apache.lucene.analysis.miscellaneous.PatternAnalyzer. You can supply a regular expression to define token delimiters.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top