Question

So I'm currently working on a project that involves the collection and storing of some huge datasets (as far as what I'm used to working with). The data essentially consists of meta information, and then actual values (where the values are trended over time).

The meta information itself is relatively large, but nothing huge, I would probably say its going to grow the the 10-50 million row size over the next couple of years. This seems manageable to me, and a single beefy SQL Server should be enough to provide quick access to this data if it is decently indexed (and the data is very easy to index, with very defined boundaries)...

However, the trending data is a completely different story. Within a year, we are VERY easily going to be pulling in 40-50 million rows every day, and that could realistically double yearly for the next 3 or 4 years.

This trending data also has very defined boundaries that would split it into MUCH more manageable sized chunks. I'm hoping I can set up some sort of partitioning mechanism that would spread this data across multiple physical database nodes. The data is essentially all contained in a single table. I looked into SQL Server table partitioning, but couldn't find a way to spread the data over multiple servers.

My question is whether there is some "relatively simple" way of implementing table partitioning over multiple physical nodes. I've also spent some time looking at Sql Server PDW, but its difficult to find information online, and I don't want to pursue that until I've established that there is not simple way of implementing this sort of solution using features built into SQL Server.

Any advice would be greatly appreciated...

Was it helpful?

Solution

I'm no expert on this but I believe what you may be looking for is database 'sharding'. There's an interesting analysis of the problems and benefits of sharding here.

Ultimately, implementation of a 'sharded' design is likely to be very costly but if your data is going to be unmanageable in a single database then this could be a good solution.

There is also a small amount of information on the Wikipedia page which includes a list of software which supports shards (e.g. the Hibernate ORM)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top