How can I speed up this MySQL query that finds the closest locations to a given latitude/longitude?

StackOverflow https://stackoverflow.com/questions/22644416

  •  21-06-2023
  •  | 
  •  

Question

I have a zip code table in my database which is used in conjunction with a business table to find businesses matching certain criteria that is closest to a specified zip code. The first thing I do is grab just the latitude and longitude since it's used in a couple places on the page. I use:

$zipResult = mysql_fetch_array(mysql_query("SELECT latitude,longitude FROM zipCodes WHERE zipCode='".mysql_real_escape_string($_SESSION['zip'])."' Limit 1"));
$latitude = $zipResult['latitude'];
$longitude = $zipResult['longitude'];
$radius = 100;

$lon1 = $longitude - $radius / abs(cos(deg2rad($latitude))*69);
$lon2 = $longitude + $radius / abs(cos(deg2rad($latitude))*69);
$lat1 = $latitude - ($radius/69);
$lat2 = $latitude + ($radius/69);

From there, I generate the query:

$query2 = "Select * From (SELECT business.*,zipCodes.longitude,zipCodes.latitude,
            (3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - $latitude)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS($latitude *pi()/180) * POWER(SIN((zipCodes.longitude - $longitude) *pi()/180 / 2), 2) ) )) as distance FROM business INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode)
            Where business.active = 1
            And (3958*3.1415926*sqrt((zipCodes.latitude-$latitude)*(zipCodes.latitude-$latitude) + cos(zipCodes.latitude/57.29578)*cos($latitude/57.29578)*(zipCodes.longitude-$longitude)*(zipCodes.longitude-$longitude))/180) <= '$radius'
            And zipCodes.longitude between $lon1 and $lon2 and zipCodes.latitude between $lat1 and $lat2
            GROUP BY business.id ORDER BY distance) As temp Group By category_id ORDER BY distance LIMIT 18";

Which turns out something like:

Select * 
From (SELECT business.*,zipCodes.longitude,zipCodes.latitude, (3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - 39.056784)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS(39.056784 *pi()/180) * POWER(SIN((zipCodes.longitude - -84.343573) *pi()/180 / 2), 2) ) )) as distance 
               FROM business 
               INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode) 
               Where business.active = 1 
               And (3958*3.1415926*sqrt((zipCodes.latitude-39.056784)*(zipCodes.latitude-39.056784) + cos(zipCodes.latitude/57.29578)*cos(39.056784/57.29578)*(zipCodes.longitude--84.343573)*(zipCodes.longitude--84.343573))/180) <= '100' 
               And zipCodes.longitude between -86.2099407074 and -82.4772052926 
               and zipCodes.latitude between 37.6075086377 and 40.5060593623 
               GROUP BY business.id 
               ORDER BY distance) As temp 
Group By category_id 
ORDER BY distance 
LIMIT 18

The code runs and executes just fine, but it takes just over a second to complete (usually around 1.1 seconds). However, I've been told that in some browsers the page loads slowly. I have tested this is multiple browsers and multiple versions of those browsers without ever seeing an issue. However, I figure if I can get the execution time down it will help either way. The problem is I do not know what else I can do to cut down on the execution time. The zip code table already came with preset indexes which I assume are good (and contains the columns I'm using in my queries). I've added indexes to the business table as well, though I'm not too knowledgeable about them. But I've made sure to include the fields used in the Where clause at least, and maybe a couple more.

If I need to add my indexes to this question just let me know. If you see something in the query itself I can improve also please let me know.

Thanks, James

EDIT

Table structure for the business table:

CREATE TABLE IF NOT EXISTS `business` (
  `id` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
  `active` tinyint(3) unsigned NOT NULL,
  `featured` enum('yes','no') NOT NULL DEFAULT 'yes',
  `topFeatured` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `category_id` smallint(5) NOT NULL DEFAULT '0',
  `listZip` varchar(12) NOT NULL,
  `name` tinytext NOT NULL,
  `address` tinytext NOT NULL,
  `city` varchar(128) NOT NULL,
  `state` varchar(32) NOT NULL DEFAULT '',
  `zip` varchar(12) NOT NULL,
  `phone` tinytext NOT NULL,
  `alt_phone` tinytext NOT NULL,
  `website` tinytext NOT NULL,
  `logo` tinytext NOT NULL,
  `index_logo` tinytext NOT NULL,
  `large_image` tinytext NOT NULL,
  `description` text NOT NULL,
  `views` int(5) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `featured` (`featured`,`topFeatured`,`category_id`,`listZip`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=3085 ;

SQL Fiddle

http://sqlfiddle.com/#!2/2e26ff/1

EDIT 2014-03-26 09:09

I've updated my query, but the shorter query actually takes about .2 seconds longer to execute every time.

Select * From (
    SELECT Distinct business.id, business.name, business.large_image, business.logo, business.address, business.city, business.state, business.zip, business.phone, business.alt_phone, business.website, business.description, zipCodes.longitude, zipCodes.latitude, (3956 * 2 * ASIN ( SQRT (POWER(SIN((zipCodes.latitude - 39.056784)*pi()/180 / 2),2) + COS(zipCodes.latitude* pi()/180) * COS(39.056784 *pi()/180) * POWER(SIN((zipCodes.longitude - -84.343573) *pi()/180 / 2), 2) ) )) as distance 
    FROM business 
    INNER JOIN zipCodes ON (business.listZip = zipCodes.zipCode) 
    Where business.active = 1 
    And zipCodes.longitude between -86.2099407074 and -82.4772052926 
    And zipCodes.latitude between 37.6075086377 and 40.5060593623 
    GROUP BY business.category_id 
    HAVING distance <= '50'
    ORDER BY distance
) As temp LIMIT 18

There is already an index on the zip code, latitude, and longitude fields in the zip codes database, both all in one index, and each with their own index. That's just how the table came when purchased.

I had updated the listZip data type to match the zip code table's zip data type yesterday.

I did take out the GROUP BY business.id and replace it with DISTINCT, but left the GROUP BY business.category_id because I only want one business per category.

Also, I started getting the 0.2 second execution difference as soon as I changed the query to use the HAVING clause instead of the math formula in the WHERE clause. I did try using WHERE distance <= 50 in the outer-query, but that didn't speed anything up either. Also using 50 miles instead of 100 miles doesn't seem to effect this particular query either.

Thanks for all of the suggestions so far though.

Was it helpful?

Solution

Put indexes on zipCodes.longitude and zipCodes.latitude. That should help a lot.

See here for more information. http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/

Edit you need an index in the zipCodes table on longitude alone or starting with longitude. It looks to me like you should try a composite index on

 (longitude, latitude, zipCode)

for best results.

Make the data types of zipCodes.zipCode and business.listingZip the same, so the join will be more efficient. If those data types are different, MySQL will typecast one to the other as it does the join, and so the join will be inefficient. Make sure business.listingZip has an index.

You are misusing GROUP BY. (Did you maybe mean SELECT DISTINCT?) It makes no sense unless you also use an aggregate function like MAX() In a similar vein, see if you can get rid of the * in SELECT business.*, and instead give a list of the columns you need.

100 miles is a very wide search radius. Narrow it a bit to speed things up.

You're computing the great circle distance twice. You surely can recast the query to do it once.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top