Вопрос

I have an exceptions file which breaks the functionality of the ignore_chars directive.

The example keyword I am working with is t-shirt.

t-shirt appears in the database. I need the ignore_chars directive to ignore the - so users can search like tshirt or t-shirt and get the same results.

The result of CALL KEYWORDS('tshirt t-shirt', 'catalog') here is

+-----------+------------+
| tokenized | normalized |
+-----------+------------+
| tshirt    | TXRT       |
| tshirt    | TXRT       |
+-----------+------------+

To get t shirt to map to the above results, I have created an exceptions file which looks like this:

t shirt > tshirt

When I do the query CALL KEYWORDS('t shirt tshirt t-shirt', 'catalog') this is what I get:

+-----------+------------+
| tokenized | normalized |
+-----------+------------+
| tshirt    | TXRT       |
| tshirt    | TXRT       |
| shirt     | XRT        |
+-----------+------------+

What I expected to happen was the exceptions file would rewrite the 'words' t shirt to the individual keyword tshirt and all 3 tokens would have the same normalized value.

Except now the - in the t-shirt keyword isn't ignored and it just maps to shirt, which results in a completely different normalized version than tshirt. On top of this, searching with any of the related keywords above returns 0 results.

When I take out the exceptions file, the ignore_chars work fine and search works again for the keywords.

Это было полезно?

Решение

The reason I went down this path was because I couldn't get the wordform t shirt > tshirt to work.

Wordforms are applied after being tokenized and I thought this was the reason it did not work.

It turns out that my min_word_len was set to 3, so the t in t shirt was not getting read properly. I reduced the min_word_len to 1 and now the wordform works properly.

This still does not solve the issue with ignore_chars and exceptions, but the search term works now, so I suppose this was the work around I needed.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top