Question

I have a quite big PostgreSQL database (with timescale plugin). Right now it is consuming about 500Gb on a SSD. Most of the data is in the form of time series. In most cases data older than a few months isn't really interesting.

My idea was to move that data to a cheap SATA hard drive instead of buying more expensive SSD's. Is that a good idea, and is there some good practice for implementing?

My naive implementation would be:
Keep two databases (or create a tablespace on the cheap HDD). Fetch every few hours data from "fast" (SSD) database to the "slow" database (HDD). Every few days, delete data from the slow database. Is this a good idea? I am happy to hear some feedback and better suggestions.

Was it helpful?

Solution

Here is a better architecture:

  • Create a new tablespace on the slow drive.

  • Set the storage parameters seq_page_cost and random_page_cost higher on that new tablespace so that the PostgreSQL optimizer knows that the disks are slower.

  • Partition the big time series tables by time ranges (use the same boundaries for all affected tables) so that you end up with a couple of dozen partitions for each.

  • Move the old partitions to the slow tablespace.

Then you still have all the data accessible.

Use PostgreSQL v11 or better for partitioning.

OTHER TIPS

PM from Timescale here. Our next release is scheduled to introduce compression, so that will help with things. Tablespaces are a good idea, and we are planning on eventually allowing you to move older chunks to different tablespaces. Please feel free to reach out to us for further info on that functionality, since it's still being developed.

PS, make sure to use drop_chunks when you delete data so as not to create too many tombstones that end up having to be vacuumed.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top