Question

I'm working on a query for a news site, which will find FeaturedContent for display on the main homepage. Content marked this way is tagged as 'FeaturedContent', and ordered in a featured table by 'homepage'. I currently have the desired output, but the query runs in over 3 seconds, which I need to cut down on. How does one optimize a query like the one which follows?

EDIT: Materialized the view every minute as suggested, down to .4 seconds:

SELECT f.position, s.item_id, s.item_type, s.title, s.caption, s.date
FROM live.search_all s 
INNER JOIN live.tags t 
ON s.item_id = t.item_id AND s.item_type = t.item_type AND t.tag = 'FeaturedContent' 
LEFT OUTER JOIN live.featured f 
ON s.item_id = f.item_id AND s.item_type = f.item_type AND f.feature_type = 'homepage'
ORDER BY position IS NULL, position ASC, date

This returns all the homepage features in order, followed by other featured content ordered by date.
The explain looks like this:

|-id---|-select_type-|-table-|-type---|-possible_keys---------|-key--------|-key_len-|-ref---------------------------------------|-rows--|-Extra-------------------------------------------------------------|
|-1----|-SIMPLE------|-t2----|-ref----|-PRIMARY,tag_index-----|-tag_index--|-303-----|-const-------------------------------------|-2-----|-Using where; Using index; Using temporary; Using filesort;--------|
|-1----|-SIMPLE------|-t-----|-ref----|-PRIMARY---------------|-PRIMARY----|-4-------|-newswires.t2.id---------------------------|-1974--|-Using index-------------------------------------------------------|
|-1----|-SIMPLE------|-s-----|-eq_ref-|-PRIMARY, search_index-|-PRIMARY----|-124-----|-newswires.t.item_id,newswires.t.item_type-|-1-----|-------------------------------------------------------------------|
|-1----|-SIMPLE------|-f-----|-index--|-NULL------------------|-PRIMARY----|-190-----|-NULL--------------------------------------|-13----|-Using index-------------------------------------------------------|

And the Profile is as follows:

|-Status---------------|-Time-----|
|-starting-------------|-0.000091-|
|-Opening tables-------|-0.000756-|
|-System lock----------|-0.000005-|
|-Table lock-----------|-0.000008-|
|-init-----------------|-0.000004-|
|-checking permissions-|-0.000001-|
|-checking permissions-|-0.000001-|
|-checking permissions-|-0.000043-|
|-optimizing-----------|-0.000019-|
|-statistics-----------|-0.000127-|
|-preparing------------|-0.000023-|
|-Creating tmp table---|-0.001802-|
|-executing------------|-0.000001-|
|-Copying to tmp table-|-0.311445-|
|-Sorting result-------|-0.014819-|
|-Sending data---------|-0.000227-|
|-end------------------|-0.000002-|
|-removing tmp table---|-0.002010-|
|-end------------------|-0.000005-|
|-query end------------|-0.000001-|
|-freeing items--------|-0.000296-|
|-logging slow query---|-0.000001-|
|-cleaning up----------|-0.000007-|

I'm new to reading the EXPLAIN output, so I'm unsure if I have a better ordering available, or anything rather simple that could be done to speed these along.

The search_all table is the materialized view table which is periodically updated, while the tags and featured tables are views. These views are not optional, and cannot be worked around.

The tags view combines tags and a relational table to get back a listing of tags according to item_type and item_id, but the other views are all simple views of one table.

EDIT: With the materialized view, the biggest bottleneck seems to be the 'Copying to temp table' step. Without ordering the output, it takes .0025 seconds (much better!) but the final output does need ordered. Is there any way to enhance the performance of that step, or work around it?

Sorry if the formatting is difficult to read, I'm new and unsure how it is regularly done.
Thanks for your help! If anything else is needed, please let me know!

EDIT: Table sizes, for reference:
Tag Relations: 197,411
Tags: 16,897
Stories: 51,801
Images: 28,383
Videos: 2,408
Featured: 13

Was it helpful?

Solution

I think optimizing your query alone won't be very useful. First thoughts are that joining a subquery, itself made of UNIONs, is alone a double bottleneck for performance.

If you have the option to change your database structure, then I would suggest to merge the 3 tables stories, images and videos into one, if they are, as it looks like, very similar (adding them a type ENUM('story', 'image', 'video')) to differentiate the records; this would remove both the subquery and the union.

Also, it looks like your views on stories and videos, are not using an indexed field to filter content. Are you querying an indexed column?

It's a pretty tricky problem without knowing your full table structure and the repartition of your data!

Another option, which would not involve bringing modifications to your existing database (especially if it is already in production), would be to "cache" this information into another table, which would be periodically refreshed by a cron job.

The caching can be done at different levels, either on the full query, or on subparts of it (independent views, or the 3 unions merged into a single cache table, etc.)

The viability of this option depends on whether it is acceptable to display slightly outdated data, or not. It might be acceptable for just some parts of your data, which may imply that you will cache just a subset of the tables/views involved in the query.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top