Question

When I search for something using the Search API I would expect the query strings to be normalized (i.e. accented letters to be turned into their non-accented counterparts).

So, for example, if I were to search for "azúcar", Search API would really search for "azucar".

Here's my Search handler code:

index = search.Index(name='index', namespace='namespace')
    results = index.search(
        query=Query(
            query_string=search_query,
            options=QueryOptions(
                limit=10,
                cursor=Cursor(),
                sort_options=SortOptions(
                    match_scorer=search.RescoringMatchScorer()
                )
            )
        )
    )

Does Search API actually do this? Am I doing something wrong?

Thanks in advance

Was it helpful?

Solution

Search API does not do this; see Partial matching GAE search API and GAE Full Text Search: can only match exact word? how to search like contains(...)? for similar discussions.

At my company we have implemented the mentioned tokenization approach and it seems to work reasonably well. One approach for your problem would be to normalize to ASCII when you're doing the tokenization. See What is the best way to remove accents in a Python unicode string? for some how-tos on that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top