Question

Keep in mind that I am a rookie in the world of sql/databases.

I am inserting/updating thousands of objects every second. Those objects are actively being queried for at multiple second intervals.

What are some basic things I should do to performance tune my (postgres) database?

Was it helpful?

Solution

It's a broad topic, so here's lots of stuff for you to read up on.

  • EXPLAIN and EXPLAIN ANALYZE is extremely useful for understanding what's going on in your db-engine
  • Make sure relevant columns are indexed
  • Make sure irrelevant columns are not indexed (insert/update-performance can go down the drain if too many indexes must be updated)
  • Make sure your postgres.conf is tuned properly
  • Know what work_mem is, and how it affects your queries (mostly useful for larger queries)
  • Make sure your database is properly normalized
  • VACUUM for clearing out old data
  • ANALYZE for updating statistics (statistics target for amount of statistics)
  • Persistent connections (you could use a connection manager like pgpool or pgbouncer)
  • Understand how queries are constructed (joins, sub-selects, cursors)
  • Caching of data (i.e. memcached) is an option

And when you've exhausted those options: add more memory, faster disk-subsystem etc. Hardware matters, especially on larger datasets.

And of course, read all the other threads on postgres/databases. :)

OTHER TIPS

First and foremost, read the official manual's Performance Tips.

Running EXPLAIN on all your queries and understanding its output will let you know if your queries are as fast as they could be, and if you should be adding indexes.

Once you've done that, I'd suggest reading over the Server Configuration part of the manual. There are many options which can be fine-tuned to further enhance performance. Make sure to understand the options you're setting though, since they could just as easily hinder performance if they're set incorrectly.

Remember that every time you change a query or an option, test and benchmark so that you know the effects of each change.

Actually there are some simple rules which will get you in most cases enough performance:

  1. Indices are the first part. Primary keys are automatically indexed. I recommend to put indices on all foreign keys. Further put indices on all columns which are frequently queried, if there are heavily used queries on a table where more than one column is queried, put an index on those columns together.

  2. Memory settings in your postgresql installation. Set following parameters higher:

.

shared_buffers, work_mem, maintenance_work_mem, temp_buffers

If it is a dedicated database machine you can easily set the first 3 of these to half the ram (just be carefull under linux with shared buffers, maybe you have to adjust the shmmax parameter), in any other cases it depends on how much ram you would like to give to postgresql.

http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html

The absolute minimum I'll recommend is the EXPLAIN ANALYZE command. It will show a breakdown of subqueries, joins, et al., all the time showing the actual amount of time consumed in the operation. It will also alert you to sequential scans and other nasty trouble.

It is the best way to start.

Put fsync = off in your posgresql.conf, if you trust your filesystem, otherwise each postgresql operation will be imediately written to the disk (with fsync system call). We have this option turned off on many production servers since quite 10 years, and we never had data corruptions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top