LEFT JOIN not working as intended
-
05-03-2021 - |
Question
I'm using mysql 8.0.21. I'm trying to join two tables such that I can find the index name of foreign keys. This is my query:
SELECT KEY_COLUMN_USAGE.CONSTRAINT_SCHEMA,
KEY_COLUMN_USAGE.TABLE_SCHEMA,
KEY_COLUMN_USAGE.REFERENCED_TABLE_SCHEMA,
KEY_COLUMN_USAGE.TABLE_NAME,
KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME,
KEY_COLUMN_USAGE.COLUMN_NAME,
KEY_COLUMN_USAGE.REFERENCED_COLUMN_NAME,
KEY_COLUMN_USAGE.CONSTRAINT_NAME,
STATISTICS.INDEX_NAME
FROM information_schema.KEY_COLUMN_USAGE
LEFT JOIN information_schema.STATISTICS
ON STATISTICS.TABLE_NAME = KEY_COLUMN_USAGE.TABLE_NAME
AND STATISTICS.COLUMN_NAME = KEY_COLUMN_USAGE.COLUMN_NAME
WHERE KEY_COLUMN_USAGE.CONSTRAINT_NAME <> 'PRIMARY'
AND KEY_COLUMN_USAGE.REFERENCED_TABLE_SCHEMA IS NOT NULL
AND KEY_COLUMN_USAGE.TABLE_SCHEMA NOT IN('mysql','performance_schema','sys')
AND STATISTICS.INDEX_NAME <> 'PRIMARY';
Output:
+-------------------+-------------------+-------------------------+------------+-----------------------+-------------+------------------------+-------------------+------------+
| CONSTRAINT_SCHEMA | TABLE_SCHEMA | REFERENCED_TABLE_SCHEMA | TABLE_NAME | REFERENCED_TABLE_NAME | COLUMN_NAME | REFERENCED_COLUMN_NAME | CONSTRAINT_NAME | INDEX_NAME |
+-------------------+-------------------+-------------------------+------------+-----------------------+-------------+------------------------+-------------------+------------+
| StudentAttendance | StudentAttendance | StudentAttendance | ATTENDANCE | STUDENT | RollNumber | RollNumber | ATTENDANCE_ibfk_1 | RollNumber |
| StudentAttendance | StudentAttendance | StudentAttendance | STUDENT | GUARDIAN | GUID | GUID | STUDENT_ibfk_1 | GUID |
| technastic | technastic | technastic | branch | employee | mgr_id | emp_id | branch_ibfk_1 | mgr_id |
| technastic | technastic | technastic | employee | branch | branch_id | branch_id | employee_ibfk_1 | branch_id |
| technastic | technastic | technastic | employee | employee | super_id | emp_id | employee_ibfk_2 | super_id |
| technastic | technastic | technastic | client | branch | branch_id | branch_id | client_ibfk_1 | branch_id |
| technastic | technastic | technastic | works_with | client | client_id | client_id | works_with_ibfk_2 | client_id |
| OFFICE | OFFICE | OFFICE | EMPLOYEE | DEPARTMENT | DeptId | DeptId | EMPLOYEE_ibfk_1 | DeptId |
+-------------------+-------------------+-------------------------+------------+-----------------------+-------------+------------------------+-------------------+------------+
8 rows in set (0.02 sec)
But the output that I'm expecting is not this. I'll explain my situation.
The query I used for finding all the foreign keys is this:
SELECT KEY_COLUMN_USAGE.CONSTRAINT_SCHEMA,
KEY_COLUMN_USAGE.TABLE_SCHEMA,
KEY_COLUMN_USAGE.REFERENCED_TABLE_SCHEMA,
KEY_COLUMN_USAGE.TABLE_NAME,
KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME,
KEY_COLUMN_USAGE.COLUMN_NAME,
KEY_COLUMN_USAGE.REFERENCED_COLUMN_NAME,
KEY_COLUMN_USAGE.CONSTRAINT_NAME
FROM information_schema.KEY_COLUMN_USAGE
WHERE KEY_COLUMN_USAGE.CONSTRAINT_NAME <> 'PRIMARY'
AND KEY_COLUMN_USAGE.REFERENCED_TABLE_SCHEMA IS NOT NULL
AND KEY_COLUMN_USAGE.TABLE_SCHEMA NOT IN('mysql','performance_schema','sys')
ORDER BY KEY_COLUMN_USAGE.TABLE_NAME ASC,
KEY_COLUMN_USAGE.COLUMN_NAME ASC;
Output:
+-------------------+-------------------+-------------------------+-----------------+-----------------------+-------------+------------------------+------------------------+
| CONSTRAINT_SCHEMA | TABLE_SCHEMA | REFERENCED_TABLE_SCHEMA | TABLE_NAME | REFERENCED_TABLE_NAME | COLUMN_NAME | REFERENCED_COLUMN_NAME | CONSTRAINT_NAME |
+-------------------+-------------------+-------------------------+-----------------+-----------------------+-------------+------------------------+------------------------+
| StudentAttendance | StudentAttendance | StudentAttendance | ATTENDANCE | STUDENT | RollNumber | RollNumber | ATTENDANCE_ibfk_1 |
| OFFICE | OFFICE | OFFICE | EMPLOYEE | DEPARTMENT | DeptId | DeptId | EMPLOYEE_ibfk_1 |
| StudentAttendance | StudentAttendance | StudentAttendance | STUDENT | GUARDIAN | GUID | GUID | STUDENT_ibfk_1 |
| technastic | technastic | technastic | branch | employee | mgr_id | emp_id | branch_ibfk_1 |
| technastic | technastic | technastic | branch_supplier | branch | branch_id | branch_id | branch_supplier_ibfk_1 |
| technastic | technastic | technastic | client | branch | branch_id | branch_id | client_ibfk_1 |
| technastic | technastic | technastic | employee | branch | branch_id | branch_id | employee_ibfk_1 |
| technastic | technastic | technastic | employee | employee | super_id | emp_id | employee_ibfk_2 |
| technastic | technastic | technastic | works_with | client | client_id | client_id | works_with_ibfk_2 |
| technastic | technastic | technastic | works_with | employee | emp_id | emp_id | works_with_ibfk_1 |
+-------------------+-------------------+-------------------------+-----------------+-----------------------+-------------+------------------------+------------------------+
10 rows in set (0.01 sec)
I also checked if the output displayed all the foreign keys by checking it with another query:
select * from referential_constraints;
Output:
+--------------------+-------------------+------------------------+---------------------------+--------------------------+------------------------+--------------+-------------+-------------+-----------------+-----------------------+
| CONSTRAINT_CATALOG | CONSTRAINT_SCHEMA | CONSTRAINT_NAME | UNIQUE_CONSTRAINT_CATALOG | UNIQUE_CONSTRAINT_SCHEMA | UNIQUE_CONSTRAINT_NAME | MATCH_OPTION | UPDATE_RULE | DELETE_RULE | TABLE_NAME | REFERENCED_TABLE_NAME |
+--------------------+-------------------+------------------------+---------------------------+--------------------------+------------------------+--------------+-------------+-------------+-----------------+-----------------------+
| def | StudentAttendance | ATTENDANCE_ibfk_1 | def | StudentAttendance | PRIMARY | NONE | NO ACTION | NO ACTION | ATTENDANCE | STUDENT |
| def | StudentAttendance | STUDENT_ibfk_1 | def | StudentAttendance | PRIMARY | NONE | NO ACTION | NO ACTION | STUDENT | GUARDIAN |
| def | technastic | branch_ibfk_1 | def | technastic | PRIMARY | NONE | NO ACTION | SET NULL | branch | employee |
| def | technastic | employee_ibfk_1 | def | technastic | PRIMARY | NONE | NO ACTION | SET NULL | employee | branch |
| def | technastic | employee_ibfk_2 | def | technastic | PRIMARY | NONE | NO ACTION | SET NULL | employee | employee |
| def | technastic | client_ibfk_1 | def | technastic | PRIMARY | NONE | NO ACTION | SET NULL | client | branch |
| def | technastic | works_with_ibfk_1 | def | technastic | PRIMARY | NONE | NO ACTION | CASCADE | works_with | employee |
| def | technastic | works_with_ibfk_2 | def | technastic | PRIMARY | NONE | NO ACTION | CASCADE | works_with | client |
| def | technastic | branch_supplier_ibfk_1 | def | technastic | PRIMARY | NONE | NO ACTION | CASCADE | branch_supplier | branch |
| def | OFFICE | EMPLOYEE_ibfk_1 | def | OFFICE | PRIMARY | NONE | NO ACTION | NO ACTION | EMPLOYEE | DEPARTMENT |
+--------------------+-------------------+------------------------+---------------------------+--------------------------+------------------------+--------------+-------------+-------------+-----------------+-----------------------+
10 rows in set (0.00 sec)
Since all the constraint names which were in referential_constraints
table are also there in key_column_usage
table, we can go ahead.
Now I wanted to find the index names of all the foreign keys. Since there was no perfect query to find the index names of all foreign keys, I came up with this query:
SELECT STATISTICS.TABLE_SCHEMA,
STATISTICS.INDEX_SCHEMA,
STATISTICS.TABLE_NAME,
STATISTICS.COLUMN_NAME,
STATISTICS.INDEX_NAME
FROM information_schema.STATISTICS
WHERE STATISTICS.INDEX_NAME <> 'PRIMARY'
AND STATISTICS.TABLE_SCHEMA NOT IN('mysql','performance_schema','sys')
ORDER BY STATISTICS.TABLE_NAME ASC,
STATISTICS.COLUMN_NAME ASC;
Output:
+-------------------+-------------------+------------+-------------+------------+
| TABLE_SCHEMA | INDEX_SCHEMA | TABLE_NAME | COLUMN_NAME | INDEX_NAME |
+-------------------+-------------------+------------+-------------+------------+
| StudentAttendance | StudentAttendance | ATTENDANCE | RollNumber | RollNumber |
| OFFICE | OFFICE | EMPLOYEE | DeptId | DeptId |
| StudentAttendance | StudentAttendance | GUARDIAN | GPhone | GPhone |
| StudentAttendance | StudentAttendance | STUDENT | GUID | GUID |
| technastic | technastic | branch | mgr_id | mgr_id |
| technastic | technastic | client | branch_id | branch_id |
| technastic | technastic | employee | branch_id | branch_id |
| technastic | technastic | employee | super_id | super_id |
| technastic | technastic | works_with | client_id | client_id |
+-------------------+-------------------+------------+-------------+------------+
9 rows in set (0.01 sec)
The problem with this query is that it doesn't find index names only for foreign keys. In the output, the column GPhone is not a foreign key; it's a unique key.
Now I wanted to join these two tables (LEFT OUTER JOIN to be specific) to easily identify the index names of the respective foreign keys. The output that I expected was this:
Please let me know where did I go wrong and what is the correct query.
Solution
Classic SQL Gotcha!
"LEFT JOIN" + WHERE condition(s) on "right" table => "INNER JOIN"
Restructure your query so that the conditions on the "right" table go into a join clause, not the where clause:
SELECT
. . .
FROM information_schema.KEY_COLUMN_USAGE kcu
LEFT JOIN information_schema.STATISTICS s
ON s.TABLE_NAME = kcu.TABLE_NAME
AND s.COLUMN_NAME = kcu.COLUMN_NAME
AND s.INDEX_NAME <> 'PRIMARY' <-- Condition on "right" table
WHERE
kcu.CONSTRAINT_NAME <> 'PRIMARY'
AND kcu.REFERENCED_TABLE_SCHEMA <> 'NULL' <-- 'NULL' or NULL ? BIG difference.
AND kcu.TABLE_SCHEMA NOT IN ( 'mysql', 'performance_schema', 'sys' )
ORDER BY
. . .
;
OTHER TIPS
INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS contains only foreign keys, so by joining with that you rule out other types of constraints:
select s.table_schema
, s.table_name
, kcu.constraint_name
, s.index_schema
, s.index_name
, s.column_name
from information_schema.referential_constraints rc
join information_schema.key_column_usage kcu
on rc.CONSTRAINT_CATALOG = kcu.CONSTRAINT_CATALOG
and rc.CONSTRAINT_SCHEMA = kcu.CONSTRAINT_SCHEMA
and rc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
join information_schema.STATISTICS s
on s.TABLE_SCHEMA = kcu.TABLE_SCHEMA
and s.TABLE_NAME = kcu.TABLE_NAME
and s.COLUMN_NAME = kcu.COLUMN_NAME
order by s.table_schema, s.table_name, kcu.constraint_name, s.seq_in_index
;
Note that this query returns columns for indexes not necesarly related to the foreign key. In for example:
create table p1 (a int not null, b int not null, primary key(a,b));
create table c1 (a int not null, b int not null, c int not null, primary key (a,b,c));
alter table c1 add constraint fk1 foreign key (a,b) references p1 (a,b);
the columns for the primary key are returned because the fk can use them. You can experiment with this Fiddle if that is not what is intended.
EDIT:
The premise for the query is a bit unclear. I'll try to illustrate that with the following Fiddle
If we create tables like:
create table p1 (a int not null, b int not null, primary key(a,b));
create table c1 (a int not null, b int not null, c int not null, primary key (a,b,c));
alter table c1 add constraint fk1 foreign key (a,b) references p1 (a,b);
An index is created for the primary key, but not for the foreign key, because that would be redundant.
select index_name, column_name
from information_schema.statistics
where table_name = 'c1';
PRIMARY a
PRIMARY b
PRIMARY c
If we, on the other hand, create a new table c2 without the primary key:
create table c2 (a int not null, b int not null, c int not null);
alter table c2 add constraint fk2 foreign key (a,b) references p1 (a,b);
MySQL automatically creates an index for this foreign key (not all DBMS do this)
select index_name, column_name
from information_schema.statistics
where table_name = 'c2';
fk2 a
fk2 b
What would you like reported in the case of tables c1 and c2?
I guess my point is that a foreign key is a logical concept (described in SQL standard), whereas an index is a physical concept. There is no one-one correspondence between a foreign key and an index.