Question

We are developing an online school diary application using django. The prototype is ready and the project will go live next year with about 500 students. Initially we used sqlite and hoped that for the initial implementation this would perform well enough. The data tables are such that to obtain details of a school day (periods, classes, teachers, classrooms, many tables are used and the database access takes 67ms on a reasonably fast PC. Most of the data is static once the year starts with perhaps minor changes to classrooms. I thought of extracting the timetable for each student for each term day so no table joins would be needed. I put this data into a text file for one student, the file is 100K in size. The time taken to read this data and process it for a days timetable is about 8ms. If I pre-load the data on login and store it in sessions it takes 7ms at login and 2ms for each query. With 500 students what would be the impact on the web server using this approach and what other options are there (putting the student text files into a sort of memory cache rather than session for example?) There will not be a great deal of data entry, students adding notes, teachers likewise, so it will mostly be checking the timetable status and looking to see what events exist for that day or week.

Was it helpful?

Solution

What is your expected response time, and what is your expected number of requests per minute? One twentieth of a second for the database access (which is likely to be slow part) for a request doesn't sound like a problem to me. SQLite should perform fine in a read-mostly situation like this. So I'm not convinced you even have a performance problem.

If you want faster response you could consider:

  1. First, ensuring that you have the best response time by checking your indexes and profiling individual retrievals to look for performance bottlenecks.
  2. Pre-computing the static parts of the system and storing the HTML. You can put the HTML right back into the database or store it as disk files.
  3. Using the database as a backing store only (to preserve state of the system when the server is down) and reading the entire thing into in-memory structures at system start-up. This eliminates disk access for the data, although it limits you to one physical server.

OTHER TIPS

This sounds like premature optimization. 67ms is scarcely longer than the ~50ms where we humans can observe that there was a delay.

SQLite's representation of your data is going to be more efficient than a text format, and unlike a text file that you have to parse, the operating system can efficiently cache just the portions of your database that you're actually using in RAM.

You can lock down ~50MB of RAM to cache a parsed representation of the data for all the students, but you'll probably get better performance using that RAM for something else, like the OS disk cache.

I agree with some of other answers which suggest to use MySQL or PostgreSQL instead of SQLite. It is not designed to be used as production db. It is great for storing data for one-user applications such as mobile apps or even a desktop application, but it falls short very quickly in server applications. With Django it is trivial to switch to any other full-pledges database backend.

If you switch to one of those, you should not really have any performance issues, especially if you will do all the necessary joins using select_related and prefetch_related.

If you will still need more performance, considering that "most of the data is static", you actually might want to convert Django site a static site (a collection of html files) and then serve those using nginx or something similar to that. The simplest way I can think of doing that is to just write a cron-job which will loop over all needed url-configs, request the page from Django and then save that as an html file. If you want to go into that direction, you also might want to take a look at Python's static site generators: Hyde and Pelican.

This approach will certainly work much faster then any caching system however you will loose any dynamic components of the site. If you need them, then caching seems like the best and fastest solution.

You should use MySQL or PostgreSQL for your production database. sqlite3 isn't a good idea.

You should also avoid pre-loading data on login. Since your records can be inserted in advance, write django management commands and run the import to your chosen database before hand and design your models such that when a user logs in, the user would already be able to access and view/edit his or her related data (which are pre-inserted before the application even goes live). Hardcoding data operations when log in does not smell right at all from an application design point-of-view.

https://docs.djangoproject.com/en/dev/howto/custom-management-commands/

The benefit of designing your django models and using custom management commands to insert the records right way before your application goes live implies that you can use django orm to make the appropriate relationships between users and their records.

I suspect - based on your description of what you need above - that you need to re-look at the approach you are creating this application.

With 500 students, we shouldn't even be talking about caching. If you want response speed, you should deal with the following issues in priority:-

  1. Use a production quality database
  2. Design your application use case correctly and design your application model right
  3. Pre-load any data you need to the production database
  4. front end optimization comes first (css/js compression etc)
  5. use django debug toolbar to figure out if any of your sql is slow and optimize specifically those
  6. implement caching (memcached etc) as needed

As a general guideline.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top