Dirty fields in django

https://stackoverflow.com/questions/110803

02-07-2019
|

Question

In my app i need to save changed values (old and new) when model gets saved. Any examples or working code?

I need this for premoderation of content. For example, if user changes something in model, then administrator can see all changes in separate table and then decide to apply them or not.

Solution

You haven't said very much about your specific use case or needs. In particular, it would be helpful to know what you need to do with the change information (how long do you need to store it?). If you only need to store it for transient purposes, @S.Lott's session solution may be best. If you want a full audit trail of all changes to your objects stored in the DB, try this AuditTrail solution.

UPDATE: The AuditTrail code I linked to above is the closest I've seen to a full solution that would work for your case, though it has some limitations (doesn't work at all for ManyToMany fields). It will store all previous versions of your objects in the DB, so the admin could roll back to any previous version. You'd have to work with it a bit if you want the change to not take effect until approved.

You could also build a custom solution based on something like @Armin Ronacher's DiffingMixin. You'd store the diff dictionary (maybe pickled?) in a table for the admin to review later and apply if desired (you'd need to write the code to take the diff dictionary and apply it to an instance).

OTHER TIPS

I've found Armin's idea very useful. Here is my variation;

class DirtyFieldsMixin(object):
    def __init__(self, *args, **kwargs):
        super(DirtyFieldsMixin, self).__init__(*args, **kwargs)
        self._original_state = self._as_dict()

    def _as_dict(self):
        return dict([(f.name, getattr(self, f.name)) for f in self._meta.local_fields if not f.rel])

    def get_dirty_fields(self):
        new_state = self._as_dict()
        return dict([(key, value) for key, value in self._original_state.iteritems() if value != new_state[key]])

Edit: I've tested this BTW.

Sorry about the long lines. The difference is (aside from the names) it only caches local non-relation fields. In other words it doesn't cache a parent model's fields if present.

And there's one more thing; you need to reset _original_state dict after saving. But I didn't want to overwrite save() method since most of the times we discard model instances after saving.

def save(self, *args, **kwargs):
    super(Klass, self).save(*args, **kwargs)
    self._original_state = self._as_dict()

Django is currently sending all columns to the database, even if you just changed one. To change this, some changes in the database system would be necessary. This could be easily implemented on the existing code by adding a set of dirty fields to the model and adding column names to it, each time you __set__ a column value.

If you need that feature, I would suggest you look at the Django ORM, implement it and put a patch into the Django trac. It should be very easy to add that and it would help other users too. When you do that, add a hook that is called each time a column is set.

If you don't want to hack on Django itself, you could copy the dict on object creation and diff it.

Maybe with a mixin like this:

class DiffingMixin(object):

    def __init__(self, *args, **kwargs):
        super(DiffingMixin, self).__init__(*args, **kwargs)
        self._original_state = dict(self.__dict__)

    def get_changed_columns(self):
        missing = object()
        result = {}
        for key, value in self._original_state.iteritems():
            if key != self.__dict__.get(key, missing):
                result[key] = value
        return result

 class MyModel(DiffingMixin, models.Model):
     pass

This code is untested but should work. When you call model.get_changed_columns() you get a dict of all changed values. This of course won't work for mutable objects in columns because the original state is a flat copy of the dict.

I extended Trey Hunner's solution to support m2m relationships. Hopefully this will help others looking for a similar solution.

from django.db.models.signals import post_save

DirtyFieldsMixin(object):
    def __init__(self, *args, **kwargs):
        super(DirtyFieldsMixin, self).__init__(*args, **kwargs)
        post_save.connect(self._reset_state, sender=self.__class__,
            dispatch_uid='%s._reset_state' % self.__class__.__name__)
        self._reset_state()

    def _as_dict(self):
        fields =  dict([
            (f.attname, getattr(self, f.attname))
            for f in self._meta.local_fields
        ])
        m2m_fields = dict([
            (f.attname, set([
                obj.id for obj in getattr(self, f.attname).all()
            ]))
            for f in self._meta.local_many_to_many
        ])
        return fields, m2m_fields

    def _reset_state(self, *args, **kwargs):
        self._original_state, self._original_m2m_state = self._as_dict()

    def get_dirty_fields(self):
        new_state, new_m2m_state = self._as_dict()
        changed_fields = dict([
            (key, value)
            for key, value in self._original_state.iteritems()
            if value != new_state[key]
        ])
        changed_m2m_fields = dict([
            (key, value)
            for key, value in self._original_m2m_state.iteritems()
            if sorted(value) != sorted(new_m2m_state[key])
        ])
        return changed_fields, changed_m2m_fields

One may also wish to merge the two field lists. For that, replace the last line

return changed_fields, changed_m2m_fields

with

changed_fields.update(changed_m2m_fields)
return changed_fields

Adding a second answer because a lot has changed since the time this questions was originally posted.

There are a number of apps in the Django world that solve this problem now. You can find a full list of model auditing and history apps on the Django Packages site.

I wrote a blog post comparing a few of these apps. This post is now 4 years old and it's a little dated. The different approaches for solving this problem seem to be the same though.

The approaches:

Store all historical changes in a serialized format (JSON?) in a single table
Store all historical changes in a table mirroring the original for each model
Store all historical changes in the same table as the original model (I don't recommend this)

The django-reversion package still seems to be the most popular solution to this problem. It takes the first approach: serialize changes instead of mirroring tables.

I revived django-simple-history a few years back. It takes the second approach: mirror each table.

So I would recommend using an app to solve this problem. There's a couple of popular ones that work pretty well at this point.

Oh and if you're just looking for dirty field checking and not storing all historical changes, check out FieldTracker from django-model-utils.

Continuing on Muhuk's suggestion & adding Django's signals and a unique dispatch_uid you could reset the state on save without overriding save():

from django.db.models.signals import post_save

class DirtyFieldsMixin(object):
    def __init__(self, *args, **kwargs):
        super(DirtyFieldsMixin, self).__init__(*args, **kwargs)
        post_save.connect(self._reset_state, sender=self.__class__, 
                            dispatch_uid='%s-DirtyFieldsMixin-sweeper' % self.__class__.__name__)
        self._reset_state()

    def _reset_state(self, *args, **kwargs):
        self._original_state = self._as_dict()

    def _as_dict(self):
        return dict([(f.name, getattr(self, f.name)) for f in self._meta.local_fields if not f.rel])

    def get_dirty_fields(self):
        new_state = self._as_dict()
        return dict([(key, value) for key, value in self._original_state.iteritems() if value != new_state[key]])

Which would clean the original state once saved without having to override save(). The code works but not sure what the performance penalty is of connecting signals at __init__

I extended muhuk and smn's solutions to include difference checking on the primary keys for foreign key and one-to-one fields:

from django.db.models.signals import post_save

class DirtyFieldsMixin(object):
    def __init__(self, *args, **kwargs):
        super(DirtyFieldsMixin, self).__init__(*args, **kwargs)
        post_save.connect(self._reset_state, sender=self.__class__,
                            dispatch_uid='%s-DirtyFieldsMixin-sweeper' % self.__class__.__name__)
        self._reset_state()

    def _reset_state(self, *args, **kwargs):
        self._original_state = self._as_dict()

    def _as_dict(self):
        return dict([(f.attname, getattr(self, f.attname)) for f in self._meta.local_fields])

    def get_dirty_fields(self):
        new_state = self._as_dict()
        return dict([(key, value) for key, value in self._original_state.iteritems() if value != new_state[key]])

The only difference is in _as_dict I changed the last line from

return dict([
    (f.name, getattr(self, f.name)) for f in self._meta.local_fields
    if not f.rel
])

return dict([
    (f.attname, getattr(self, f.attname)) for f in self._meta.local_fields
])

This mixin, like the ones above, can be used like so:

class MyModel(DirtyFieldsMixin, models.Model):
    ....

If you're using your own transactions (not the default admin application), you can save the before and after versions of your object. You can save the before version in the session, or you can put it in "hidden" fields in the form. Hidden fields is a security nightmare. Therefore, use the session to retain history of what's happening with this user.

Additionally, of course, you do have to fetch the previous object so you can make changes to it. So you have several ways to monitor the differences.

def updateSomething( request, object_id ):
    object= Model.objects.get( id=object_id )
    if request.method == "GET":
        request.session['before']= object
        form= SomethingForm( instance=object )
    else request.method == "POST"
        form= SomethingForm( request.POST )
        if form.is_valid():
            # You have before in the session
            # You have the old object
            # You have after in the form.cleaned_data
            # Log the changes
            # Apply the changes to the object
            object.save()

An updated solution with m2m support (using updated dirtyfields and new _meta API and some bug fixes), based on @Trey and @Tony's above. This has passed some basic light testing for me.

from dirtyfields import DirtyFieldsMixin
class M2MDirtyFieldsMixin(DirtyFieldsMixin):
    def __init__(self, *args, **kwargs):
        super(M2MDirtyFieldsMixin, self).__init__(*args, **kwargs)
        post_save.connect(
            reset_state, sender=self.__class__,
            dispatch_uid='{name}-DirtyFieldsMixin-sweeper'.format(
                name=self.__class__.__name__))
        reset_state(sender=self.__class__, instance=self)

    def _as_dict_m2m(self):
        if self.pk:
            m2m_fields = dict([
                (f.attname, set([
                    obj.id for obj in getattr(self, f.attname).all()
                ]))
                for f,model in self._meta.get_m2m_with_model()
            ])
            return m2m_fields
        return {}

    def get_dirty_fields(self, check_relationship=False):
        changed_fields = super(M2MDirtyFieldsMixin, self).get_dirty_fields(check_relationship)
        new_m2m_state = self._as_dict_m2m()
        changed_m2m_fields = dict([
            (key, value)
            for key, value in self._original_m2m_state.iteritems()
            if sorted(value) != sorted(new_m2m_state[key])
        ])
        changed_fields.update(changed_m2m_fields)
        return changed_fields

def reset_state(sender, instance, **kwargs):
    # original state should hold all possible dirty fields to avoid
    # getting a `KeyError` when checking if a field is dirty or not
    instance._original_state = instance._as_dict(check_relationship=True)
    instance._original_m2m_state = instance._as_dict_m2m()

for everyone's information, muhuk's solution fails under python2.6 as it raises an exception stating 'object.__ init __()' accepts no argument...

edit: ho! apparently it might've been me misusing the the mixin... I didnt pay attention and declared it as the last parent and because of that the call to init ended up in the object parent rather than the next parent as it noramlly would with diamond diagram inheritance! so please disregard my comment :)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow