Pergunta

  1. I have a table now containing over 43 million records. To execute SELECT, I usually select records with the same field, say A. Will it be more efficient to divide the table into several tables by different A and save in the database? How much can I gain?

  2. I have one table named entry: {entryid (PK), B}, containing 6 thousand records, and several other tables with the similar structure T1: {id(PK), entryid, C, ...}, containing over millions of records. Do the following two processes have the same efficiency?

    SELECT id FROM T1, entry WHERE T1.entryid = entry.entryid AND entry.B = XXX

and

SELECT entryid FROM entry WHERE B = XXX
//format a string S as (entryid1, entryid2, ... )
//then run
SELECT id FROM T1 WHERE entryid IN S
Foi útil?

Solução

In this instance, I will answer your second question first.

There is a way to blend the queries to behave as one and do it efficiently.

Your first method is a query that behaves as follows

  • JOIN of T1 and entry by entryid forming a giant temp table
  • Traverse the temp table to process the WHERE clause

Your second method is essentially two queries

  • Lookup entryid where B is some value XXX
  • Compile all entryid values in a string
  • Execute query using WHERE entryid IN
  • The concantenated list in place in an unindexed temp table
  • Cartesian JOIN back to T1 to see which values match

In both cases, you must still form a temp table of entryid values

What you need to do is reorganize the query's execution, a.k.a. refactoring.

Here is your first query totally refactored:

SELECT
    A.entryid
FROM
    (SELECT entryid id FROM entry WHERE B = XXX) A
    LEFT JOIN T1 USING (id)
;

This presents your query but it does two things

  1. It puts together in the list of entryids first using the WHERE clause
  2. It performs the JOIN based on the length of subquery A

This reorganization should speed up the processing without additional table changes.

However, since subquery A gets entryid values based on the value of B, you should have an index that helps round up those fast. Please create this index:

ALTER TABLE entry ADD INDEX B_entryid_ndx (B,entryid);

Using that new refactored query and making that additional index, it is as fast as possible since refactoring forces WHERE to happen before JOINs.

With reference to your first question, the refactored query should retrieve just what it needs whether it is partitioned on not. Partitioning would just be an exercise of storage engine selection.

MySQL support two paradigms for partitioning

With the MERGE storage engine, there is no long migration path. The mapping takes place in 2 seconds. The maintenance of each individual table could affect any query against the MERGE engine if there is no primary key to unique identify one MyISAM table from another MyISAM table.

With Table Partition, the individual tables has a partition map built in. Mapping may include a migration path. Maintenance is just a mixed bag as it would be with any other table.

In either case, a well-designed indexing scheme needs to be in place. Why? The query's WHERE, ORDER BY and GROUP BY clauses should dictate what indexes are really needed to support the query.

Outras dicas

I would be considering the use of table partitioning. You don't mention mysql version or storage engine types. Here is the doc link:

http://dev.mysql.com/doc/refman/5.6/en/partitioning.html

for 5.6

Licenciado em: CC-BY-SA com atribuição
Não afiliado a dba.stackexchange
scroll top