Partitioning table by aggregate id for efficient eventsourcing in RDBMS

https://softwareengineering.stackexchange.com/questions/373609

06-02-2021
|

Question

My events are stored in MySQL and I'm reaching a point where I have tens of millions of rows and things are starting to get sluggish when pulling an aggregate

I have read about "table partitioning" and was wondering if it was a good option for storing my aggregate events, since an aggregate is just a stream of events, each aggregate would have it's own partition.

My doubt is, do I still need a index on the aggregate ID if I'm going to partition also by the aggregate ID? Have you guys ever used "table partitioning" to solve this issue?

Solution

MySql has a limit to the number of partitions you can have on a table, which I believe is currently 8192. Therefore, it may be impractical to have a separate partition for each Aggregate ID, as you would likely run out of partitions very quickly.

When altering the table, you'll need to provide a partitioning function that tells the database how to distribute the data into partitions. If you want to use a substring of the Aggregate ID to determine the partition, you can use a PARTITION BY RANGE function where you define fixed set of partitions based on the values, as described in this answer. Personally, I would probably use either the PARTITION BY HASH or PARTITION BY KEY so that you'll get an even distribution of the data into partitions, and you can define an arbitrary number of partitions. This will also allow you to easily add more partitions later. You can see an example of this technique described here.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange