MySQL performance with large number of records - partitioning?

Question

The best optimization is determined by the queries you run, not by your tables' structure.

If you want to use partitioning, this can be a great optimization, if the partitioning scheme supports the queries you need to optimize. For instance, you could partition by US state, and that would help queries against data for a specific state. MySQL supports "partition pruning" so that the query would only run against the specific partition -- but only if your query mentions a specific value for the column you used as the partition key.

You can always check whether partition pruning is effective by using EXPLAIN PARTITIONS:

EXPLAIN PARTITIONS
SELECT ... FROM MyTable WHERE state = 'NY';

That should report that the query uses a single partition.

Whereas if you need to run queries by date for example, then the partitioning wouldn't help; MySQL would have to repeat the query against all 50 partitions.

EXPLAIN PARTITIONS
SELECT ... FROM MyTable WHERE date > '2013-05-01';

That would list all partitions. There's a bit of overhead to query all partitions, so if this is your typical query, you should probably use range partitioning by date.

So choose your partition key with the queries in mind.

Any other optimization technique follows a similar pattern -- it helps some queries, possibly to the disadvantage of other queries. So be sure you know which queries you need to optimize for, before you decide on the optimization method.

Re your comment:

Certainly there are many databases that have 40 million rows or more, but have good performance. They use different methods, including (in no particular order):

Indexing
Partitioning
Caching
Tuning MySQL configuration variables
Archiving
Increasing hardware capacity (e.g. more RAM, solid state drives, RAID)

My point above is that you can't choose the best optimization method until you know the queries you need to optimize. Furthermore, the best choice may be different for different queries, and may even change over time as data or traffic grows. Optimization is an continual process, because you won't know where your bottlenecks are until after you see how your data grows and the query traffic your database receives.