Django Haystack : search for a term with and without accents

https://stackoverflow.com/questions/2240880

19-09-2019
|

Question

I'm implementing a search system onto my django project, using django haystack. The problem is that some fields in my models have some french accents, and I would like to find the entries which contents the query with and without accents.

I think the best Idea is to create a SearchIndex with both the fields with the accents, and the same field without the accents.

Any idea or hint on this ?

Here is some code

Imagine the following models :

Cars(models.Model):
    name = models.CharField()

and the following Haystack Index:

Cars(indexes.SearchIndex):
    name = indexes.CharField(model_attr='name')
    cleaned_name = indexes.CharField(model_attr='name')

    def prepare_cleaned_name(self, object):
        return strip_accents(object.name)

now, in my index template, I put the both fields :

{{ object.cleaned_name }}
{{ object.name }}

So, thats some pseudo code, I don't know if it works, but if you have any idea on this, let me know !

Solution 2

I find a way to index both value from the same field in my Model.

First, write a method in your model which returns the ascii value of the fields:

class Car(models.Model):
    name = model.CharField()

    def ascii_name(self):
        return strip_accents(self.name)

So that in your template used to generate the index, you could do this:

{{ object.name }}
{{ object.ascii_name }}

Then, you just have to rebuild your indexes !

OTHER TIPS

Yes, you're on the right track here. Sometimes you do want to store fields multiple times, with different transformations applied.

An example of this in my application is that I have two title fields. One for searching which gets stemmed (the process by which test ~= test ~= tester), and another for sorting which is left alone (the stemming interferes with the sort order).

This is an analogous case.

In my schema.xml this is handled by:

<field name="title" type="text" indexed="true" stored="true" multiValued="false" />
<field name="title_sort" type="string" indexed="true" stored="true" multiValued="false" />

The type "string" is responsible for storing the "as-is" version of the title.

By the way, it you're stripping accents just to make words easier to search for, this is something that might be worth looking into: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory

You must do something like follow:

Cars(indexes.SearchIndex):
    name = indexes.CharField(model_attr='name')

    def prepare(self, obj):
        self.prepared_data = super(Cars, self).prepare(obj)
        self.prepared_data['name'] += '\n' + strip_accents(self.prepared_data['name'])
        return self.prepared_data

I don't like this solution. I would like to know some way to configure my seach backend to do it for me. I use whoosh.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow