Domanda

I'm trying to optimize my site and found this nice little Django doc: Database Access Optimization, which suggests profiling followed by indexing and the selection of proper fields as the starting point for database optimization.

Normally, the django docs explain things pretty well, even things that more experienced programmers might consider "obvious". Not so in this case. After no explanation of indexing, the doc goes on to say:

We will assume you have done the obvious things above.

Uhhh. Wait! What the heck is indexing?

Obviously I can figure out what indexing is via google, my question is: what is it that I need to know as far as database stuff goes in order to create a scalable website? What should I be aware of about the Django framework specifically? What other "obvious" things ought I know? Where can I learn them?

I'm looking to get pointed in a direction here. I don't need to learn anything and everything about SQL, I just want to be informed enough to build my app the right way.

Thanks in advance!

È stato utile?

Soluzione

I encourage you to read all that the other answers suggest and whatever else you can find on the subject, because it's all good information to know and will make you a better programmer.

That said, one of the nice things about Django and other similar frameworks is that for the most part you don't have to know what's going on behind the scenes in the DB. Django adds indexes automatically for fields that need them. The encouragement to add more is based on the use cases of your app. If you continually query based on one particular field, you should ensure that that field is indexed. It might be already (if it's a foreign key, primary key, etc.), but other random fields typically aren't.

There's also various optimizations that are database client-specific. Django can't do much here because it's goal is to remain database independent. So, if you're using PostgreSQL, MySQL, whatever, read about optimizations and best practices concerning those particular clients.

Altri suggerimenti

Wikipedia database design, and database normalization http://en.wikipedia.org/wiki/Database_design, and http://en.wikipedia.org/wiki/Database_normalization are two very important concepts, in addition to indexing.

In addition to these, having a basic understanding of your database of choice is necessary. Being able to add users, set permissions, and create a database are key things that you should know.

Learning how to backup your data is also a crucial thing.

The list keeps getting longer, one should also be aware of the db relationships that django handles for you, OneToOne, ManyToMany, ManyToOne. https://docs.djangoproject.com/en/dev/topics/db/models/

The performance impact of JOINs shouldn't be ignored. Access model properties in django is so easy, but understanding that some of Foreign Key relationships could have huge performance impacts is something to consider too.

Once you have a basic understanding of these things you should be at a pretty good starting point for creating a non-trivial django app!

Wikipedia has a nice article about database indexes, they are similar(ish) to an index in a book i.e. lets you (the computer) find things faster because you just look at the index (probably a very bad example :-)

As for performance there are many things you can do and presumably as it is a very detailed subject in itself, and is something that is particular to each RDBMS then it would be distracting / irrelevant for them (django) to go into great detail. Best thing is really to google performance tips for your particular RDBMS. There are some general tips such as indexing, limiting queries to only return the required data etc.

I think one of the main things is a good design, sticking as much as possible to Normal Form and in general actually taking your database into consideration before programming your models etc (which clearly you seem to be doing). Naming conventions are also a big plus, remembering explicit is better then implicit :-)

To summarise:

  • Learn/understand the fundamentals such as the relational model
  • Decide on a naming convention
  • Design your database perhaps using an ERM tool
  • Prefer surrogate ID's
  • Use the correct data type of minimum possible size
  • Use indexes appropriately and don't over index
  • Avoid unecessary/over querying
  • Prioritise security and stability over raw performance
  • Once you have an up and running database 'tune' the database analysing/profiling settings, queries, design etc
  • Backup and archive regularly - cron
  • Hang out here :-)
  • If required advance into replication (master/slave - django supports this quite well too)
  • Consider upgrading your hardware
  • Don't get too hung up about it
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top