Question

I feel like this is quite delicate,

I have various folders whith projects I would like to backup into a zip/tar file, but would like to avoid backing up files such as pyc files and temporary files.

I also have a Postgres db I need to backup.


Any tips for running this operation as a python script?

Also, would there be anyway to stop the process from hogging resources in the process?


Help would be very much appreciated.

Was it helpful?

Solution

If you're on Linux (or any other form of Unix, such as MacOSX), a simple way to reduce a process's priority -- and therefore, indirectly, its consumption of CPU if other processes want some -- is the nice command. In Python (same OSs), os.nice lets your program "make itself nicer" (reduce priority &c).

For backing up a PostgreSQL DB, I recommend PostgreSQL's own tools; for zipping up a folder except the pyc files (and temporary files -- however it is you identify those), Python is quite suitable. For example:

>>> os.chdir('/tmp/az')
>>> f = open('/tmp/a.zip', 'wb')
>>> z = zipfile.ZipFile(f, 'w')
>>> for root, dirs, files in os.walk('.'):
...   for fn in files:
...     if fn.endswith('.pyc'): continue
...     fp = os.path.join(root, fn)
...     z.write(fp)
... 
>>> z.close()
>>> f.close()
>>> 

this zips all files in said subtree except those ending in .pyc (without compression -- if you want compression, add a third argument zipfile.ZIP_DEFLATED to the zipfile.ZipFile call). Could hardly be easier.

OTHER TIPS

On linux, you can use tar with --exclude option. an example, to exclude your .pyc files and temp files (in this example, .tmp)

$ tar zcvf backup.tar.gz --exclude "*.tmp" --exclude "*.pyc"

use the z option to zip it up as well.

With today's multicore cpus, you may find that cpu is not the bottle neck. It is now far more likely to the the disk I/O that needs to be shared better.

Linux has the ionice command to allow you to control this

ionice(1)

NAME

   ionice - get/set program io scheduling class and priority

SYNOPSIS

   ionice [[-c class] [-n classdata ] [-t]] -p PID [PID ...]

   ionice [-c class] [-n classdata ] [-t] COMMAND [ARG ...]

DESCRIPTION
This program sets or gets the io scheduling class and priority for a program. If no arguments or just -p is given, ionice will query the current io scheduling class and priority for that process.

Backup is at least as much about the importance of recovery using whatever backup you make.

The right way to back up source code is to keep source files in a VCS (version control system), and back up the VCS repository. Exclude any auto-generated easily-replaced files (like those *.pyc files, etc.) from the VCS repository. I recommend Bazaar for very efficient storage and user-friendliness, but your team will likely already have a VCS they prefer.

For backup of a PostgreSQL database, it's best to use pg_dump to regularly dump the database to a text file, compress that, and back up the result. This is because the backup then becomes restorable on any machine, by re-playing the database dump into another PostgreSQL server.

As for how to automate it: you would be best using a Bash program for the purpose, since it's just a matter of connecting some commands to files, which is what the shell excels at.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top