MySql, indexing and speeding up query

https://stackoverflow.com/questions/21838454

12-10-2022
|

Question

Hello again good people of Stackoverflow. Based on a few answers from similar questions, I believe adding indexes to my tables will help the query below. Challenge is, I'm not too familiar or comfortable using indexes as it seems there is potential of slowing down other queries if using too many indexes. Just looking for someone to help steer me in the right direction here. Thank you in advance for helping.

Query:

SELECT
    territories.territoryID,
    territories.territory_name,
    territories_meta.tm_color,
    territories.territory_description,
    territories.territory_state,
    GROUP_CONCAT(distinct(territories_zips.tz_zip)SEPARATOR ', ' ) AS ZipCodes,
    count(distinct(users.userID)) as AgentsAssigned,
    GROUP_CONCAT(distinct(concat(users.user_Fname,' ',users.user_Lname))SEPARATOR ', ') 
         AS AgentName,
   a.sumTerr as TotalOpp
from(
   SELECT
       territories_zips.tz_terrID as terrID,
       sum(boundaries_meta.bm_opportunity) as sumTerr
   FROM territories_zips
   INNER JOIN boundaries ON boundaries.boundary_name = territories_zips.tz_zip
   INNER JOIN boundaries_meta ON boundaries.boundary_id = boundaries_meta.bm_boundariesID
   where tz_status = 1
   group by tz_terrID
)as a
inner join territories on territories.territoryID = a.terrId
INNER JOIN territories_zips ON territories.territoryID = territories_zips.tz_terrID
INNER JOIN territories_assign ON territories.territoryID = territories_assign.ta_territoryID
INNER JOIN users ON users.userID = territories_assign.ta_repID
INNER JOIN territories_meta ON territories_meta.tm_territoryID = territories.territoryID
WHERE
   territories_zips.tz_status = 1 AND
   territories_assign.ta_repStatus = 1 AND
   users.user_status = 1
GROUP BY territoryID

Explain:

id  select_type table   typw    possible_keys   key key_len ref rows    extra
1   PRIMARY <derived2>  ALL                 97  Using temporary; Using filesort
1   PRIMARY territories_meta    ALL                 121 Using where; Using join buffer
1   PRIMARY territories_zips    ALL                 1739    Using where; Using join buffer
1   PRIMARY territories_assign  ALL                 138 Using where; Using join buffer
1   PRIMARY users   eq_ref  PRIMARY PRIMARY 8   msb_db.territories_assign.ta_repID  1   Using where
1   PRIMARY territories eq_ref  PRIMARY PRIMARY 8   msb_db.territories_meta.tm_territoryID  1   Using where
2   DERIVED territories_zips    ALL                 1739    Using where; Using temporary; Using filesort
2   DERIVED boundaries_meta ALL                 42995   Using join buffer
2   DERIVED boundaries  eq_ref  PRIMARY PRIMARY 4   msb_db.boundaries_meta.bm_boundariesID  1   Using where

I think the culprit is this part of the subquery as it takes just 2 seconds less to run that part as compared to the whole query above.

SELECT
   territories_zips.tz_terrID as terrID,
   sum(boundaries_meta.bm_opportunity) as sumTerr
FROM territories_zips
INNER JOIN boundaries ON boundaries.boundary_name = territories_zips.tz_zip
INNER JOIN boundaries_meta ON boundaries.boundary_id = boundaries_meta.bm_boundariesID
where tz_status = 1
group by tz_terrID

and it's explain:

 id select_type table   typw    possible_keys   key key_len ref rows    extra
 1  SIMPLE  territories_zips    ALL                 1739    Using where; Using temporary; Using filesort
 1  SIMPLE  boundaries_meta ALL                 42995   Using join buffer
 1  SIMPLE  boundaries  eq_ref  PRIMARY PRIMARY 4   mb_db.boundaries_meta.bm_boundariesID   1   Using where

I've included the tables below for this sub query, please let me know if I need to repost other table structures

Tables:

CREATE TABLE `boundaries` (
      `boundary_id` int(11) NOT NULL AUTO_INCREMENT,
  `boundary_name` varchar(20) DEFAULT NULL,
  `geometry_type` varchar(12) DEFAULT NULL,
  `boundary_geometry` mediumtext,
  `boundary_type` varchar(5) DEFAULT NULL,
  `boundary_state` varchar(4) DEFAULT NULL,
  PRIMARY KEY (`boundary_id`)
 ) ENGINE=MyISAM AUTO_INCREMENT=64504 DEFAULT CHARSET=utf8;

 CREATE TABLE `boundaries_meta` (
   `boundaries_metaID` bigint(20) NOT NULL AUTO_INCREMENT,
   `bm_boundariesID` bigint(20) NOT NULL,
   `bm_opportunity` int(5) NOT NULL,
  PRIMARY KEY (`boundaries_metaID`)
) ENGINE=MyISAM AUTO_INCREMENT=51201 DEFAULT CHARSET=utf8;


 CREATE TABLE `territories_zips` (
  `terr_zipsID` bigint(10) NOT NULL AUTO_INCREMENT,
 `tz_terrID` bigint(10) NOT NULL,
 `tz_zip` varchar(5) CHARACTER SET latin1 NOT NULL,
 `tz_status` smallint(1) NOT NULL,
 PRIMARY KEY (`terr_zipsID`)
) ENGINE=MyISAM AUTO_INCREMENT=2576 DEFAULT CHARSET=utf8;

Thank you again for any help.

EDIT: I updated some of the tables with indexes and got incredible improvement (thank you again King Isaac). I'm including the new explain on the sub query as I still am not comfortable with how and why this helped or if I actually created the indexes in the right parts. Give a man a fish he eats for day, teach him how to fish and....

  id    select_type table   type    possible keys   key key_len ref    rows   extra
1   SIMPLE  territories_zips    ALL                 1739      Using where; Using temporary; Using filesort
1   SIMPLE  boundaries  ref PRIMARY,bndIDindex,bndNameindex bndNameindex    63  func    1   Using where
1   SIMPLE  boundaries_meta eq_ref  bmBndIDindex    bmBndIDindex    8   mb_db.boundaries.boundary_id    1   Using where

Solution

It looks like your first step will be to handle the tz_zip and boundary_name join. My first question would be: Are these unique? Applying a UNIQUE index to these tables should significantly speed up your subquery. If they will not be unique then a standard index will still provide you with a high enough cardinality to see a speed increase.

The 'status' fields on all of the tables should also be indexed. Even though these will end up being low cardinality indexes, it will benefit the query without causing much index overhead.

You may also want to see if you can refactor this query to eliminate the subquery in the 'from' clause. This is causing the entire query to depend on a temporary table which must be fully established before the query process can continue. I would go out on a limb and say that it's also the reason that you're seeing so many 'ALL' types. The query analyzer is not able to operate on a subset of the data, so it's doing a full table scan. This is bad when it's happening to one table, in your case it's happening to five.

I would look at handling boundary_meta as just another join and handling the SUM(boundaries_meta.bm_opportunity) in the SELECT. It may need to be a dependent subquery, but you should still see an increase in performance.

As to your fear about index speed: Over indexing can be a problem when adding multiple indexes to a table, but generally it is not a concern unless you are indexing several 'char' based columns. Since we are only talking about two varchar(5) columns, it shouldn't be an issue.

Whether or not to index a column is always a cost/benefit question. Cost is measured in size and benefit can be measured in cardinality.

Your best bet here will be to play with the query structure and indexing. If necessary (and an option) clone your database to a separate server and just try different solutions until you find one that works.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow