Question

We're currently using Postgres 9 on Amazon's EC2 and are very satisfied with the performance. Now we're looking at adding ~2TB of data to Postgres, which is larger than our EC2 small instance can hold.

I found S3QL and am considering using it in conjunction with moving the Postgres data directory to S3 storage. Has anyone had experience with doing this? I'm mainly concerned with performance (frequent reads, less frequent writes). Any advice is welcome, thanks.

Was it helpful?

Solution

My advice is "don't do that". I don't know anything about the context of your problem, but I guess that a solution doesn't have to involve doing bulk data processing through PostgreSQL. The whole reason grid processing systems were invented was to solve the problem of analyzing large data sets. I think you should consider building a system that follows standard BI practices around extracting dimensional data. Then take that normalized data and, assuming it's still pretty large, load it into Hadoop/Pig. Do your analysis and aggregation there. Dump the resulting aggregate data into a file and load that into your PG database along-side the dimensions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top