Question

I did just come up with the following idea, but I lack the knowledge to say if it is applicable to a production application.

We have a web application, built on PHP/mySQL to make it easy. A table in the database are prone to grow large - a couple million of records easily, so table sharding might be an option here.

Here's how I have imagined the process to work:

A cached file contains the a list with the available tables in the database. Each table contains a maximum of a million rows and when that is reached, the cached list is recreated after a new table has been constructed.

Obviously it wouldn't be a good idea to check the number of rows on every write to the table, so this could be done on a set interval, like a week, or daily - depending on how quick every million of data is created.

Would this be a good way to deal with large amount of data and to keep index sizes fairly low?

Thanks

Was it helpful?

Solution

If you are planning ahead for the possibility of enormous growth (game gone viral, for instance) you can follow the steps of those before you and go NoSQL.

Couchbase / powers Zinga (and is a personal favorite)
Apache Cassandra / powers Twitter
mongoDB / powers Craiglist

But you're building a site in php/MySQL to "make it easy" so don't re-invent the wheel on an extremely big problem.

Don't mess with the data. Go for a proven solution. MySQL included.

OTHER TIPS

You should use horizontal partitioning, partition the table by number of records, lets say every partition will have a million records, that way mysql will internally be handling the partitioning, and besides instead of one big index, the indexes would be partitioned as well.

Read more here http://dev.mysql.com/tech-resources/articles/performance-partitioning.html

With all honesty, I don't think that would be a great idea. You should look into possibly archiving old data or going to a NoSQL solution like MOngo.

The performance of indexes does not degrade linearly with the size of the table. Tables have to be seriously massive before that becomes an issue. If you are seeing performance problems, I'd start doing mysql 'explains' and making sure all your queries are doing the least amount of row scans they can do. You might be suprised at what the actual bottleneck ends up being.

So, basically, if you need the data, I wouldnt go messing around with it. On the other hand, if its something like session data, just delete the rows that are too old.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top