Вопрос

I am using haystack in our django application for search and search is working very fine. But I am having an issue with reamtime search. For realtime search I am using haystack's default RealTimeSignalProcessor(haystack.signals.RealtimeSignalProcessor). My model contains one many to many field in it. When data is changed for this many to many field only, it seems the realtimesignal processor is not updating indexing data properly. After updating the many to many data, I am getting wrong search result.

Its working after manually running rebuild_index command. I think rebuild_index is working because its doing cleaning first and then again building indexing data.

Can someone suggest some solution to the problem ?

By the way following is code around it.

Model:

class Message_forum(models.Model):
      message = models.ForeignKey(Message)
      tags = models.ManyToManyField(Tag, blank=True, null=True) #this is many to many field

search_index.py:

class Message_forumIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.EdgeNgramField(document=True, use_template=True)
    message = indexes.CharField(model_attr='message', null=True)
    tags = indexes.CharField(model_attr='tags', null=True)

    def get_model(self):
        return Message_forum

    def index_queryset(self, using=None):
        return self.get_model().objects.all()

    def prepare_tags(self, obj):
        return [tag.tag for tag in obj.tags.all()]

index template:

{{ object.tags.tag }}

settings.py:

HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

I am having latest version of haystack and whoosh as back-end.

Это было полезно?

Решение

I have figured it out after digging into code of haystack.

In haystack default RealTimeSignalProcessor, its connecting post_save and post_delete signals of each model of application. Now in handle_save method is being called in post_save and post_delete signal. In this method haystack is validating the sender and in my case for tags(many-to-many) field, Message_forum_tag model is being passed as sender. Now index for this model is not present into my search_index since its not my application model but instead django's generated one. And so in handle_save method it was bypassing any changes on this model and hence it wasn't updating indexed data for changed object.

So I have figured out two different solution for this problem.

  1. I can create custom realtime signal processor specific to my model Message_forum, in this in setup method I can connect m2mchanged signal on each many-to-many fields in Message_forum with handle_save. At the same time I can pass Message_forum as sender so that haystack will pass the validation(not exactly validation but its trying to get its index obj) around it and will update the index data of changed object.

  2. The other way is to ensure that whenever any many-to-many field is being changed, save method of its parent(here Message_forum.save()) is being called. And so it will always invoke post_save signal and after that haystack will update the index object data.

Have spend around 3 hours to figure it out. Hope this will help someone having same problem.

Другие советы

I had a similar issue, but I went with a hybrid of Nikhil's number 1 and 2 options.

For a model called ContentItem with a m2m field called categories, I created a custom signal processor that extended the base one.

So I implemented a setup() duplicated from the source, but added the following line:

models.signals.m2m_changed.connect(self.handle_save, sender=ContentItem.categories.through)

And did the same with teardown() but with a similar disconnect line. I also extended handle_save and changed the line:

index = self.connections[using].get_unified_index().get_index(sender)

to

index = self.connections[using].get_unified_index().get_index(instance.__class__)

This means that this signal processor is watching for m2m changes in the management table for ContentItem to Category, but when a m2m change is made will pass the name of the correct class i.e. ContentItem instead of ContentItem.categories.through.

This seems to work for the most part, but if I delete a Category the m2m_changed doesn't fire despite the relationship being removed. It looks like this might be a bug in django itself.

So I also added the following line to setup (and a disconnect to teardown):

models.signals.pre_delete.connect(self.handle_m2m_delete, sender=Category)

And created a method duplicate of handle_save (handle_m2m_delete) which manually removed the relationship from the through table and saved the effected ContentItems (causing the original handle_save to then be triggered). This meant at least that I didn't have to remember to save the parent to update the index anywhere else in the code.

I can suggest an alternative solution, simpler than the complication of trying to watch all the right signals and ending up with a signal processor that has to know about all your m2m relationships.

It looks like this:

signals.py:

from collections import OrderedDict

from haystack.signals import RealtimeSignalProcessor


class BatchingSignalProcessor(RealtimeSignalProcessor):
    """
    RealtimeSignalProcessor connects to Django model signals
    we store them locally for processing later - must call
    ``flush_changes`` from somewhere else (eg middleware)
    """

    # Haystack instantiates this as a singleton

    _change_list = OrderedDict()

    def _add_change(self, method, sender, instance):
        key = (sender, instance.pk)
        if key in self._change_list:
            del self._change_list[key]
        self._change_list[key] = (method, instance)

    def handle_save(self, sender, instance, created, raw, **kwargs):
        method = super(BatchingSignalProcessor, self).handle_save
        self._add_change(method, sender, instance)

    def handle_delete(self, sender, instance, **kwargs):
        method = super(BatchingSignalProcessor, self).handle_delete
        self._add_change(method, sender, instance)

    def flush_changes(self):
        while True:
            try:
                (sender, pk), (method, instance) = self._change_list.popitem(last=False)
            except KeyError:
                break
            else:
                method(sender, instance)

middleware.py:

from haystack import signal_processor


class HaystackBatchFlushMiddleware(object):
    """
    for use with our BatchingSignalProcessor

    this should be placed *at the top* of MIDDLEWARE_CLASSES
    (so that it runs last)
    """
    def process_response(self, request, response):
        try:
            signal_processor.flush_changes()
        except AttributeError:
            # (in case we're not using our expected signal_processor)
            pass
        return response

settings.py:

MIDDLEWARE_CLASSES = (
    'myproject.middleware.HaystackBatchFlushMiddleware',
    ...
)

HAYSTACK_SIGNAL_PROCESSOR = 'myproject.signals.BatchingSignalProcessor'

I'm trying this out in my project, seems to work fine. I welcome any feedback or suggestions.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top