SELECT in federated table is to slow

https://dba.stackexchange.com/questions/81411

11-12-2020
|

Question

When i test select * from myFederatedTable (for test there is 50k rows) it take 0.3732 seconds now I have two federated tables with data < 20000 in both tables, and have query like :

SELECT i.id_news, i.id_news_type, i.id_agency, i.add_date, i.update_date, j.title, k.image_name_latin, m.folder_path
FROM tbl i
INNER JOIN tbl_languages j on i.id_news = j.id_news and j.lang = 'en'
INNER JOIN federated_Table1 k on i.id_image_groups = k.id_image_groups
INNER JOIN federated_table2 m on k.id_folder_groups = m.id_folder_groups
ORDER BY i.update_date DESC
LIMIT 14

It takes 16 seconds

Without joins to federated tables

SELECT i.id_news, i.id_news_type, i.id_agency, i.add_date, i.update_date, j.title
    FROM tbl i
    INNER JOIN tbl_languages j on i.id_news = j.id_news and j.lang = 'en'
    ORDER BY i.update_date DESC
    LIMIT 14

0.0496 seconds. I have few questions

1.Is there any way to optimize it ?
2.when i select from federated table (really i select from main database?) or when i join to federated table I really join to main database?
3.Task is copy data from one database tables to another database tables. Is there best way to replicate this 2 tables for quick selection ?

Solution

If update_date doesn't have an index on it, you need one, by the way. If that's missing, it could make a huge performance difference.

With federated tables, you have to remember that the local server thinks of the remote table as if it were essentially a MyISAM table on disk, and it decides how to join a federated table based on what it thinks it knows about the cost of accessing the data... then the federated engine, itself, takes the request for rows that the optimizer generated, and turns it into a SQL query, to fetch the data remotely.

If your data quality is such that you can loosen the inner joins to the federated tables into left joins, the optimizer may take a hint and perform the query in a more efficient fashion. That's actually a somewhat critical piece, because the server needs to be able to decide which rows to include based on local information. With an inner join, the presence or absence of a row on the remote server qualifies or disqualifies the joining row from being included.

Here's one alternate approach, that looks worse to the trained eye but is may perform better in spite of the fact that it's going to pull down an materialize a temporary copy of each of the federated tables. The temporary copy disappears when the query finishes:

SELECT i.id_news, i.id_news_type, i.id_agency, i.add_date, i.update_date, j.title, k.image_name_latin, m.folder_path
FROM tbl i
INNER JOIN tbl_languages j on i.id_news = j.id_news and j.lang = 'en'
INNER JOIN (SELECT id_image_groups, image_name_latin FROM federated_Table1) k on i.id_image_groups = k.id_image_groups
INNER JOIN (SELECT id_folder_groups, folder_path FROM federated_table2) m on k.id_folder_groups = m.id_folder_groups
ORDER BY i.update_date DESC
LIMIT 14;

This strategy is what I use when I run reporting jobs that need to join tables from disparate systems. If this is a live interactive web site, it may not be the best approach.

OTHER TIPS

Federated tables are going to be inherently slow, especially when doing joins. It's basically copying down an entire copy of the table for each join.

Federated tables should simply be avoided.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange