Question

I have a large database with many tables. Some of these tables may contain an email field. They are all the same type of varchar(255). What I want to be able to do is join all of the tables with an email column together and get a list of all the unique emails on the database.

The problem I have is there will be multiple tables with that column so I need it to be high performance. The second problem is that afaik, joining tables together means that they must have a common value from the first table to the joined table.

I want to be able to do something like:

SELECT
    email
FROM
    table_1
LEFT JOIN
    table_2
USING (email)
LEFT JOIN
    table_3
USING (email)

However, I want to be able to get all of the emails from all the tables regardless of if they exist in the other table.

Would it be quicker to do a union select on all the tables in a temp table and select distinct from that?

I need to be able to do this over lots of tables (10+) with thousands to millions of rows over them potentially. If needed, I will set the email columns up as an INDEX.

Thanks

Was it helpful?

Solution

If you want unique emails from all the tables, including emails that only occur in one table, you should use UNION instead of JOIN

SELECT DISTINCT email FROM table_a
UNION SELECT DISTINCT email FROM table_b
UNION SELECT DISTINCT email FROM table_c
...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top