Question

I have a django script that should be run at a specified time every day. I am trying to achieve this using crontab. The script is supposed to dump the database, archive it using gzip and upload it to bitbucket.

The following is the relevant part of my crontab file:

00 4    * * *   root    python /my_django_project_path/manage.py update_locations
47 16   * * *   root    python /my_django_project_path/manage.py database_bu

When I execute python /my_django_project_path/manage.py database_bu it works perfectly fine. However crontab either does not execute it or something happens along the way. Even weirder, the first crontab command (update_locations) is executed perfectly fine.

Reading this question, I have tried the following, without success:

Changing the command to:

47 16   * * *   root    (cd /my_django_project_path/ && python manage.py database_bu)

Changing the command to:

47 16   * * *   root    /usr/bin/python /my_django_project_path/manage.py database_bu

Adding the following to my script (even though the other one works fine without it):

#!/usr/bin/python

from django.core.management import setup_environ
import settings
setup_environ(settings)

Running everything through a script that exports the django project settings:

/my_django_project_path/cron_command_executor.sh:

export DJANGO_SETTINGS_MODULE=my_django_project.settings 
python manage.py ${*}

The following in crontab:

47 16   * * *   root    ./my_django_project_path/cron_command_executor.sh database_bu

Changing the user to both my user and the Apache user (www-data).

I have a newline at the end of my crontab file.

UPDATE:

When doing sudo su, running the command manually no longer works. It gets stuck and doesn't do anything.

The output of tail -f /var/log/syslog is:

Mar 3 18:26:01 my-ip-address cron[726]: (system) RELOAD (/etc/crontab) 
Mar 3 18:26:01 my-ip-address CRON[1184]: (root) CMD (python /my_django_project_path/manage.py database_bu)

UPDATE:

I am using the following .netrc file to prevent git asking for credentials:

machine bitbucket.org
    login myusername
    password mypassword

The actual code for the backup script is:

import subprocess
import sh
import datetime
import gzip
from django.core.management.base import BaseCommand

class Command(BaseCommand):
    def handle(self, *args, **options):
        execute_backup()

FILE_NAME = 'some_file_name.sql'
ARCHIVE_NAME = 'some_archive_name.gz'
REPO_NAME    = 'some_repo_name'
GIT_USER = 'some_git_username' # You'll need to change this in .netrc as well.
MYSQL_USER   = 'some_mysql_user'
MYSQL_PASS   = 'some_mysql_pass'
DATABASE_TO_DUMP = 'SomeDatabase' # You can use --all-databases but be careful with it! It will dump everything!.

def dump_dbs_to_gzip():
    # Dump arguments.
    args = [
        'mysqldump', '-u', MYSQL_USER, '-p%s' % (MYSQL_PASS),
        '--add-drop-database',
        DATABASE_TO_DUMP,
    ]
    # Dump to file.
    dump_file = open(FILE_NAME, 'w')
    mysqldump_process = subprocess.Popen(args, stdout=dump_file)
    retcode = mysqldump_process.wait()
    dump_file.close()
    if retcode > 0:
        print 'Back-up error'
    # Compress.
    sql_file = open(FILE_NAME, 'r')
    gz_file = gzip.open(ARCHIVE_NAME, 'wb')
    gz_file.writelines(sql_file)
    gz_file.close()
    sql_file.close()
    # Delete the original file.
    sh.rm('-f', FILE_NAME)

def clone_repo():
    # Set the repository location.
    repo_origin = 'https://%s@bitbucket.org/%s/%s.git' % (GIT_USER, GIT_USER, REPO_NAME)

    # Clone the repository in the /tmp folder.
    sh.cd('/tmp')
    sh.rm('-rf', REPO_NAME)
    sh.git.clone(repo_origin)
    sh.cd(REPO_NAME)

def commit_and_push():
    # Commit and push.
    sh.git.add('.')
    sh.git.commit(m=datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    sh.git.push('origin', 'master')
    sh.cd('..')
    sh.rm('-rf', REPO_NAME)

def execute_backup():
    clone_repo()
    dump_dbs_to_gzip()
    commit_and_push()

if __name__ == "__main__":
    execute_backup()

UPDATE:

I managed to fix it using Chris Clark's suggestion of calling the script directly rather than through manage.py. However, I am still interested in what is causing this issue so the bounty is still available.

UPDATE [SOLVED]:

Adding the following line to /etc/environment and running it as my user account rather than root fixed it:

PWD=/my_django_project_path/helpers/management/commands

I still wonder why only my user can run it so if anyone has the solution to that, please contribute.

Was it helpful?

Solution

Since some version of python /my_django_project_path/manage.py database_bu works for you, it means the problem is with your cron environment, or in the way you have set up your cron and not with the script itself (as in the size of file to be uploaded or network connectivity is not causing the issue).

Firstly, you are running the script as

47 16 * * * root python /my_django_project_path/manage.py database_bu

You are providing it a username root, which is not the same user as your current user, while the shell command worked for your current user. The fact that the same command doesn't run from root user using sudo su suggests that your root user account is not properly configured anyway. FWIW, scheduling something as root should almost always be avoided because it can lead to weird file permission issues.

So try scheduling your cron job as follows from that current user.

47 16 * * * cd /my_django_project_path/ && python manage.py database_bu

This may still not run the cron job completely. In which case, the problem could be at 2 places - your shell environment is having some variables that are missing from your cron environment, or your .netrc file is not being read properly for credentials, or both.

In my experience, I have found that PATH variable causes the most troubles, so run echo $PATH on your shell, and if the path value you get is /some/path:/some/other/path:/more/path/values, run your cron job like

47 16 * * * export PATH="/some/path:/some/other/path:/more/path/values" && cd /my_django_project_path/ && python manage.py database_bu

If this doesn't work out, check all the environment variables next.

Use printenv > ~/environment.txt from a normal shell to get all the environment variables set in the shell. Then use the following cron entry * * * * * printenv > ~/cron_environment.txt to identify the missing variables from the cron environment. Alternatively, you can use the snippet in a script to get the value of environment from with the script

import os
os.system("printenv")

Compare the two, figure out any other relevant variables which are different (like HOME), and try using the same within the script/cron entry to check if they work or not.

If things still don't work out, then I think the remaining problem should be with your bitbucket credentials in .netrc in which saving the username and password. The contents .netrc might not be available in the cron environment.

Instead, create and set up an ssh keypair for your account and let the backup happen over ssh instead of https (Its better if you generate a ssh key without passphrase in this step, to avoid ssh-keys' gotchas).

Once you have setup the ssh keys, you will accordingly have to edit the existing origin url from .git/config file of your project root (or will have to add a new remote origin_ssh using git remote add origin_ssh url for the ssh protocol).

Note that https urls for the repo is like https://user@bitbucket.org/user/repo.git while the ssh one is like git@bitbucket.org:user/repo.git.

PS: bitbucket, or rather git is not the ideal solution for backups, there are tonnes of threads hanging around for better backup strategies. Also, while debugging, run your crons every minute (* * * * *), or at similarly low frequency for faster debugging.

Edit

OP says in the comment that setting the PWD variable worked for him.

PWD=/my_django_project_path/helpers/management/commands to /etc/environment

This is what I had suggested earlier, one of the environment variable available in the shell not being present in cron environment.

In general, crown always runs with a reduced set of environment variable and permission, and setting the right variables will make cron work.

Also since you are using a .netrc file for permissions, it is specific to your account, and therefore that won't work with any other account (including the sudo account for root), unless you configure the same setting in your other account as well.

OTHER TIPS

That reminds me of a very frustrating gotcha. Do you have a newline at the end of your crontab file? From man crontab:

...cron requires that each entry in a crontab end in a newline character. If the last entry in a crontab is missing the newline, cron will consider the crontab (at least partially) broken and refuse to install it.

This is also a shot in the dark - our team has had issues running management commands through cron. We never bothered to track down why they were flaky, but after much hair-pulling we reverted to invoking the python functions directly rather than going through manage.py and things have been humming along fine ever since.

I’m not very good at reading strace output, but I think the one you posted indicates that your program has invoked git and is awaiting its termination. You mention uploading to BitBucket, so here’s a shot in the dark: git tries to push to an ssh remote; when you run it as yourself, ssh-agent authenticates you transparently; but when you run it as root, there’s no ssh-agent, thus git prompts for ssh password and awaits your input.

Try doing the git invocation manually under sudo su and check.

If this does not help, you need to get at the output of git (or whatever it is you’re actually invoking there). Check the documentation for the sh package for details on how to redirect the standard output and standard error.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top