Question

1) First query used ... which took around 23 secs

select a.id from mza_movie_upload a,mza_movie_statics b 
where a.status=1 and b.download=1 and a.id=b.rid 
group by b.rid order by sum(b.download) desc

Currently i modified the query ..which takes around 9 secs

select a.id from mza_movie_upload a 
INNER JOIN mza_movie_statics b 
ON a.id=b.rid WHERE a.status=1 and b.download=1 
group by b.rid order by sum(b.download) desc

explain select a.id from mza_movie_upload a  INNER JOIN mza_movie_statics b  ON     a.id=b.rid WHERE a.status=1 and b.download=1  group by b.rid order by sum(b.download) desc;
+----+-------------+-------+--------+---------------+---------+---------+----------------------+---------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                  | rows    | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+----------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | b     | ALL    | NULL          | NULL    | NULL    | NULL                 | 1603089 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | a     | eq_ref | PRIMARY       | PRIMARY | 4       | mmdfurni_dev11.b.rid |       1 | Using where                                  |
+----+-------------+-------+--------+---------------+---------+---------+----------------------+---------+----------------------------------------------+
2 rows in set (0.03 sec)

I am not sure what is the performance to be done ? I want this query to be fast .. I tried to index rid and id which still made the query worse.

Here is the table details

mza_movie_upload

+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| id            | int(11)      | NO   | PRI | NULL    | auto_increment |
| userid        | varchar(200) | NO   |     | NULL    |                |
| email         | varchar(200) | NO   |     | NULL    |                |
| up_date       | datetime     | NO   |     | NULL    |                |
| file_size     | varchar(200) | NO   |     | NULL    |                |
| temp_filename | varchar(200) | NO   |     | NULL    |                |
| fileneame     | varchar(200) | NO   | MUL | NULL    |                |
| filepath      | varchar(255) | NO   |     | NULL    |                |
| status        | varchar(20)  | NO   |     | NULL    |                |
| ip            | varchar(200) | NO   |     | NULL    |                |
| category      | varchar(200) | NO   |     | NULL    |                |
| mcode         | bigint(20)   | NO   |     | NULL    |                |
| movie_name    | varchar(200) | NO   |     | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+
13 rows in set (0.00 sec)

mza_movie_statics

+-----------+---------+------+-----+---------+----------------+
| Field     | Type    | Null | Key | Default | Extra          |
+-----------+---------+------+-----+---------+----------------+
| id        | int(11) | NO   | PRI | NULL    | auto_increment |
| rid       | int(11) | NO   |     | NULL    |                |
| uid       | int(11) | NO   |     | NULL    |                |
| save      | int(11) | NO   |     | NULL    |                |
| download  | int(11) | NO   |     | NULL    |                |
| enterdate | date    | NO   |     | NULL    |                |
+-----------+---------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
Was it helpful?

Solution

Try to rewrite the query into:

SELECT b.rid 
FROM mza_movie_upload a 
INNER JOIN mza_movie_statics b 
ON a.id=b.rid 
WHERE a.status= '1'  and b.download= '1'  
-- group by b.rid order by sum(b.download) desc;
GROUP BY b.rid ORDER BY count(*) DESC;

In this query SELECT a.id is replaced by SELECT b.rid, and is 100% equivalent to the oryginal query because of JOIN ... ON a.id=b.rid predicate, but leads MySql to slighly better plan

And, as @Dennis Leon sugested, a.status= '1' and b.download= '1' are compared to strings, not numbers.

Try also to replace order by sum(b.download) desc with order by count(*) desc - since the query retrieves only rows with b.download = '1', then sum( b.download ) is equivalent to count(*) - this change allows to save a few hundreds miliseconds on conversion from strings to numbers within SUM( .. ).

In the end create two indexes:

create index bbbb on mza_movie_statics( download, rid );
create index aaaaa on mza_movie_upload( status );

then try the query speed after above changes.

OTHER TIPS

I would recommend applying an index on a.status and/or b.download if you wish to see further performance gains. Keep in mind creating additional indexes do come with additional overhead in regards to insert/update/deleting of records - in this case it seems arguably necessary.

Furthermore, before adding new indexes to these tables (presumably in your production environment) keep in mind mysql will create a temp copy of the table, which, for a table with lots of records (>1 million) may take a while. (so i would recommend testing locally on a table with similar size)

lastly, i noticed that in your query you have in your where clause: a.status=1 however the status column is a varchar. In order to avoid converting between two different data types (which slows down query execution time), and potentially breaking your future index, i would recommend changing it to: a.status='1' (notice the quotes)

Your query can be better optimized if you have what is considered a COVERING index. That is... the index has the columns associated with what you are looking for including criteria. This way the engine does not have to go to the raw data to actually check the respective status and download parts.

So, on mza_movie_upload have an index on (id, status) on mza_movie_statics have an index on (rid, download)

Next, the group by will work best on the index that is driving the query, and since a.id = b.rid, but a.id can be the driving index, let IT be the group by value.

select
      mu.id
   from
      mza_movie_upload mu
         JOIN mza_movie_statics ms
            on mu.id = ms.rid
           AND ms.download > 0
   group by
      b.rid
   order by
      sum( b.download ) DESC

Now, a comment on the download. It appears to be a numeric, so you probably don't want to compare to '1' explicitly as it appears that column is a counter for the number of times something was downloaded. And what you are looking for is the thing that was downloaded the most times. If this IS always a value of 1, then yes, leave that to = 1 instead of > 0.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top