문제

Please take note that the example below is just an example, my scenario is way more complex and the way i'm trying to model it really makes sense

Let's say i'm creating a table for audit events in one of my apps - so all of the "event_created", "user_created" etc. kind of thing. The table contains several columns, some of them are a foreign keys to other tables. Over time, this single table can grow to several milion of records.

From performance perspective, is it faster and more performant to use a single table for all of them or to use separate table for each kind of event and operate on separate tables? Or it doesn't make much differrence? It might sound silly to create a separate table for each kind of event but you need to trust me that in my real world scenario, it really makes sense.

도움이 되었습니까?

해결책

Denormalize only as a last resort

Do not denormalize your table design for imaginary performance problems. Avoid falling into premature optimization.

Design a proper structure. Generate fake data to populate the tables. Run tests under circumstances akin to your deployment scenario. If a significant performance problem is proven:

  • Research the nature of the problem using the EXPLAIN & ANALYZE features.
    • Examine use of indexes. Verify that existing indexes were used in queries as expected; if not, re-write your query with a different approach. Add an index where needed.
  • Study how to tune Postgres.
  • Trying moving some logic from your app to the database server by writing functions in a language such as PL/pgSQL, where all the data is local to the executing code.
  • Install larger cache unit on your RAID device. Consider tuning the cache to allocate more to reading vs writing, as appropriate to your situation; they often default to 50-50.
    (By the way, be certain your RAID’s write-cache is battery-backed to avoid ruining your database or other files.)
    • Or, if using ZFS instead of RAID, learn to tune that to prioritize your database on faster drives.
  • Consider the use of partitioning to physically segregate rows in storage. Consider adding faster storage such as enterprise-quality solid-state storage on which you store your most frequented data, or store the data you most want to access quickly such as data used by the Big Cheese.
    • Partitioning comes with restrictions and trade-offs. So be sure you have no better way to solve your performance problem.
    • If you do go with partitioning, be aware that recent versions of Postgres (10, 11, 12 as I vaguely recall) have been dramatically improved with improvements with declarative partitioning.
  • Hire a Postgres expert for consultation in your testing and tuning.

Only after having exhausted all avenues to fixing a proven performance problem should you consider denormalizing.

Postgres is a powerful enterprise-quality database system. Several million rows on modern hardware with sufficient RAM and wise indexing should be no problem at all.

On the other hand, if your different types of events represent different entities, then they should be kept in separate tables. How do we know if similar kinds of rows are different entities or not? Clues might be found in asking: Do they have mostly the same columns with the same semantics? Do your users ever want to display or report together? Might you ever want to aggregate (calculate count, average, median, etc) together?

Be aware that as a product with a long history dating back to the days when computer hardware was much more limited in capabilities and configurations that today’s hardware, Postgres by default has quite conservative settings on initial installation. For example, by default Postgres runs on an older Raspberry Pi! So anyone running a larger database on more capable hardware should be doing some tuning.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 dba.stackexchange
scroll top