Question

In MySQL I am attempting to use Information Schema to gather data about my tables. My goal is to select the data in one query and insert it directly into another table for future analysis and processing.

Here is a query that I have written. Unfortunately, most of the time the pk_length (Number of columns in the primary key) is incorrect, and sometimes the count of keys on a table is wrong too.

  SELECT
    t.table_schema,
    t.table_name,
    SUM(k.constraint_name="PRIMARY") as pk_length,
    count(distinct s.index_name) as key_count,
    t.table_rows
  FROM tables t, statistics s, key_column_usage k
    WHERE  t.table_name = s.table_name 
    AND t.table_schema = s.table_schema
    AND t.table_name = k.table_name
    AND t.table_schema = k.table_schema
  GROUP BY t.table_schema, t.table_name;

What am I doing wrong that is resulting in incorrect data for those 2 fields?

EDIT: Here is the fixed query, using a subquery.

  SELECT
    t.table_schema,
    t.table_name,
    k.pk_length,
    count(distinct s.index_name) as key_count,
    t.table_rows
  FROM tables t,
    statistics s,
    (SELECT table_schema, table_name, count(*) as "pk_length" 
      FROM key_column_usage group by table_schema, table_name) as k
  WHERE  t.table_name = s.table_name 
    AND t.table_schema = s.table_schema
    AND t.table_name = k.table_name
    AND t.table_schema = k.table_schema
  GROUP BY t.table_schema, t.table_name
  LIMIT 20;
Was it helpful?

Solution

Remember that:

  • When you join two tables on a non-unique value, you end up with all possible combinations of the rows containing the matching fields.

  • table_name and table_schema are not a unique value in key_column_usage.

As such, in situations where there is more than one row in key_column_usage for a given table, you're ending up with multiple copies of the corresponding row from statistics. This is messing up your pk_length.

You will most likely need to split this up into two separate queries: one for pk_length and the other for key_count.

OTHER TIPS

updated

SELECT
    t.table_schema,
    t.table_name,
    t.table_rows,
    SUM(s.index_name = 'PRIMARY') AS pk_length,
    COUNT(DISTINCT s.index_name) AS key_count
FROM
    tables t, statistics s
WHERE
    t.table_name = s.table_name 
    AND t.table_schema = s.table_schema
GROUP BY
    t.table_schema, t.table_name;

first approach(wrong answer)

What about this?

SELECT
    t.table_schema,
    t.table_name,
    t.table_rows,
    SUM(s.index_name = 'PRIMARY') AS pk_length,
    SUM(s.index_name != 'PRIMARY') AS key_count
FROM
    tables t, statistics s
WHERE
    t.table_name = s.table_name 
    AND t.table_schema = s.table_schema
GROUP BY
    t.table_schema, t.table_name;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top