What is the correct way to backup ZODB blobs?

https://stackoverflow.com/questions/451952

19-08-2019
|

Question

I am using plone.app.blob to store large ZODB objects in a blobstorage directory. This reduces size pressure on Data.fs but I have not been able to find any advice on backing up this data.

I am already backing up Data.fs by pointing a network backup tool at a directory of repozo backups. Should I simply point that tool at the blobstorage directory to backup my blobs?

What if the database is being repacked or blobs are being added and deleted while the copy is taking place? Are there files in the blobstorage directory that must be copied over in a certain order?

Solution

Backing up "blobstorage" will do it. No need for a special order or anything else, it's very simple.

All operations in Plone are fully transactional, so hitting the backup in the middle of a transaction should work just fine. This is why you can do live backups of the ZODB. Without knowing what file system you're on, I'd guess that it should work as intended.

OTHER TIPS

It should be safe to do a repozo backup of the Data.fs followed by an rsync of the blobstorage directory, as long as the database doesn't get packed while those two operations are happening.

This is because, at least when using blobs with FileStorage, modifications to a blob always results in the creation of a new file named based on the object id and transaction id. So if new or updated blobs are written after the Data.fs is backed up, it shouldn't be a problem, as the files that are referenced by the Data.fs should still be around. Deletion of a blob doesn't result in the file being removed until the database is packed, so that should be okay too.

Performing a backup in a different order, or with packing during the backup, may result in a backup Data.fs that references blobs that are not included in the backup.

I have an script that copies for a month the blobs using hard links ( so you have and historial of the blobs as the Data.fs ):

backup.sh

#!/bin/sh

# per a fer un full : ./cron_nocturn.sh full

ZEO_FOLDER=/var/plone/ZEO

# Zeo port
ZEO_PORT = 8023

# Name of the DB
ZEO_DB = zodb1

BACKUP_FOLDER=/backup/plone

LOGBACKUP=/var/plone/ZEO/backup.log

BACKUPDIR=`date +%d`

echo "INICI BACKUP" >> $LOGBACKUP
echo `date` >> $LOGBACKUP

# Fem el packing

if [ "$1" = "full" ]; then
  $ZEO_FOLDER/bin/zeopack -S $ZEO_DB -p $ZEO_PORT -h 127.0.0.1


echo "   Comprovant folders"
#mirem si existeix el folder de backup
if ! [ -x $BACKUP_FOLDER/$ZEO_DB ]; then
   mkdir $BACKUP_FOLDER/$ZEO_DB
fi

#mirem si existeix el backup folder del dia
if ! [ -x $BACKUP_FOLDER/blobs/$BACKUPDIR/ ] ; then
   mkdir $BACKUP_FOLDER/blobs/$BACKUPDIR/
fi

echo "   Backup Data.fs"
# backup de Data.fs
if  [ "$1" = "full" ]; then
   echo "   Copiant Data.fs"
   $ZEO_FOLDER/bin/repozo -B -F -r $BACKUP_FOLDER/$ZEO_DB/ -f $ZEO_FOLDER/var/filestorage/Data_$ZEO_DB.fs
   echo "   Purgant backups antics"
   $ZEO_FOLDER/neteja.py -l $BACKUP_FOLDER/$ZEO_DB -k 2
else
   $ZEO_FOLDER/bin/repozo -B -r $BACKUP_FOLDER/$ZEO_DB/ -f $ZEO_FOLDER/var/filestorage/Data_$ZEO_DB.fs
fi

echo "   Copiant blobs"
# backup blobs
rm -rf $BACKUP_FOLDER/blobs/$BACKUPDIR
cd $BACKUP_FOLDER/current-blobs && find . -print | cpio -dplm $BACKUP_FOLDER/blobs/$BACKUPDIR
rsync --force --ignore-errors --delete --update -a $ZEO_FOLDER/var/blobs/ $BACKUP_FOLDER/current-blobs/


echo "FI BACKUP" >> $LOGBACKUP
echo `date` >> $LOGBACKUP

neteja.py

#!/usr/bin/python2.4

# neteja.py -l [directori_desti] -k [numero_fulls_a_mantenir]
# Script que neteja un directori amb backups i guarda nomes els ultims fulls que li especifiquis
# Es basa en la utilitzacio de collective.recipe.backup
# Author: Victor Fernandez de Alba <sneridagh@gmail.com>

import sys, getopt

sys.path[0:0] = [
  '/var/plone/genwebupcZEO/produccio/eggs/collective.recipe.backup-1.3-py2.4.egg',
  '/var/plone/genwebupcZEO/produccio/eggs/zc.buildout-1.4.2-py2.4.egg',
  '/var/plone/genwebupcZEO/produccio/eggs/zc.recipe.egg-1.2.2-py2.4.egg',
  '/var/plone/genwebupcZEO/produccio/eggs/setuptools-0.6c11-py2.4.egg',
  ]

import collective.recipe.backup.repozorunner

argv = sys.argv[1:]
try:
    opts, args = getopt.getopt(argv, "l:k:", ["location=", "keep="])
except getopt.GetoptError:
    print "neteja.py -l [directori_desti] -k [numero_fulls_a_mantenir]"
    sys.exit(2)

for opt, arg in opts:
    if opt in ("-l", "--location"):
        location = arg
    elif opt in ("-k", "--keep"):
        keep = arg

if len(opts)<2:
    print "neteja.py -l [directori_desti] -k [numero_fulls_a_mantenir]"
    sys.exit(2)

collective.recipe.backup.repozorunner.cleanup(location, keep)

Your backup strategy for the FileStorage is fine. However, making a backup of any database that stores data in multiple files never is easy as your copy has to happen with no writes to the various files. For the FileStorage a blind stupid copy is fine as it's just a single file. (Using repozo is even better.)

In this case (with BlobStorage combined with FileStorage) I have to point to the regular backup advice:

take the db offline while making a file-system copy
use snapshot tools like LVM to freeze the disk at a given point
do a transactional export (not feasable in practice)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow