Pergunta

I've got a Django-based website using a PostgreSQL database hosted on Webfaction. I normally manually collect the data for my database (copy-paste into a text file) from another website which lists all of the data on a single web page in an HTML table.

As far as automatically gathering that data with Python, I'm guessing that I should use something like html5lib or Scrapy to write a script that loads the web page, finds the HTML table I want, extracts the data from it, formats it into JSON, and then uses

manage.py loaddata fixturename.json

to load my data into my database. My question, though, is how do I get this script to run automatically once a day on Webfaction's server?

Foi útil?

Solução

You can use cron to schedule tasks.

Your crontab file could look something like this:

# Minute   Hour   Day of Month       Month          Day of Week        Command    
# (0-59)  (0-23)     (1-31)    (1-12 or Jan-Dec)  (0-6 or Sun-Sat)                
    0        1          *             *               *           /usr/bin/python manage.py loaddata fixturename.json

(Or you can use @daily /usr/bin/python manage.py loaddata fixturename.json to run at midnight every night)

See the webfaction documentation: http://docs.webfaction.com/software/general.html#scheduling-tasks-with-cron

Outras dicas

You could YQL to scrap websites for you and return the results in json format.I extensively use YQL to get data for my apps.Its fast and your server doesn't have to take the load for it .

http://developer.yahoo.com/yql/

To run the script once a day you can try adding it to a cron job

http://docs.webfaction.com/software/general.html#scheduling-tasks-with-cron

http://garrett.im/django/sysadmin/2011/10/03/cron-django-webfaction.html

You want to run a CRON job. It's a simle way to get a server to run a job once or repeatedly on any schedule you set.

Also make sure you have permission to screen scrape someone else's content.

Cron or celerybeat are good options. Cron is easier, celery gives you more control

http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top