Question

I have a central Django server containing all of my information in a database. I want to have a second Django server that contains a subset of that information in a second database. I need a bulletproof way to selectively sync data between the two.

  • The secondary Django will need to pull its subset of data from the primary at certain times. The subset will have to be filtered by certain fields.
  • The secondary Django will have to occasionally push its data to the primary.
  • Ideally, the two-way sync would keep the most recently modified objects for each model.

I was thinking something along the lines of having using TimeStampedModel (from django-extensions) or adding my own DateTimeField(auto_now=True) so that every object stores its last modified time. Then, maybe a mechanism to dump the data from one DB and load it in to the other such that only the more recently modified objects are kept.

Possibilities I am considering are django's dumpdata, django-extensions dumpscript, django-test-utils makefixture or maybe django-fixture magic. There's a lot to think about, so I'm not sure which road to proceed down.

Was it helpful?

Solution

Here is my solution, which fits all of my requirements:

  1. Implement natural keys and unique constraints on all models
    • Allows for a unique way to refer to each object without using primary key IDs
  2. Sublcass each model from TimeStampedModel in django-extensions
    • Adds automatically updated created and modified fields
  3. Create a Django management command for exporting, which filters a subset of data and serializes it with natural keys

    baz = Baz.objects.filter(foo=bar)
    yaz = Yaz.objects.filter(foo=bar)
    
    objects = [baz, yaz]
    flat_objects = list(itertools.chain.from_iterable(objects))
    
    data = serializers.serialize("json", flat_objects, indent=3, use_natural_keys=True)
    print(data)
    
  4. Create a Django management command for importing, which reads in the serialized file and iterates through the objects as follows:

    • If the object does not exist in the database (by natural key), create it
    • If the object exists, check the modified timestamps
    • If the imported object is newer, update the fields
    • If the imported object is older, do not update (but print a warning)

Code sample:

# Open the file
with open(args[0]) as data_file:
    json_str = data_file.read()

# Deserialize and iterate
for obj in serializers.deserialize("json", json_str, indent=3, use_natural_keys=True):

    # Get model info
    model_class = obj.object.__class__
    natural_key = obj.object.natural_key()
    manager = model_class._default_manager

    # Delete PK value
    obj.object.pk = None

    try:
        # Get the existing object
        existing_obj = model_class.objects.get_by_natural_key(*natural_key)

        # Check the timestamps
        date_existing = existing_obj.modified
        date_imported = obj.object.modified
        if date_imported > date_existing:

            # Update fields
            for field in obj.object._meta.fields:
                if field.editable and not field.primary_key:
                    imported_val = getattr(obj.object, field.name)
                    existing_val = getattr(existing_obj, field.name)
                    if existing_val != imported_val:
                        setattr(existing_obj, field.name, imported_val)

    except ObjectDoesNotExist:
        obj.save()

The workflow for this is to first call python manage.py exportTool > data.json, then on another django instance (or the same), call python manage.py importTool data.json.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top