Question

I have a database table (created by someone else). This table consists over billions of records and records are being inserted every second or so.

I need to optimize this table and hence the queries to fetch stuffs faster. Following is the table ProductCatalog structure.

id                  int(10)
SerialNumber        varchar(20)     
BasePrice           decimal(4,1)    
BatchCode           tinyint(3)
Type                varchar(5)      
ItemCode            varchar(5)      
ArrivalDate         datetime        
InsertTimestamp     int(10)
BrandID             tinyint(3)
CompanyID           tinyint(4)      
Model               varchar(10)     
Description         text   

Here is the script to create table

CREATE TABLE `ProductCatalog` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `SerialNumber` varchar(20) DEFAULT NULL,
  `BasePrice` decimal(4,1) DEFAULT NULL,
  `BatchCode` tinyint(3) unsigned DEFAULT NULL,
  `Type` varchar(5) DEFAULT NULL,
  `ItemCode` varchar(5) DEFAULT NULL,
  `ArrivalDate` datetime DEFAULT NULL,
  `InsertTimestamp` int(10) unsigned NOT NULL,
  `BrandID` tinyint(3) unsigned DEFAULT NULL,
  `Model` varchar(10) NOT NULL DEFAULT 'RX209',
  `Description` text,
  PRIMARY KEY (`id`),
  KEY `SerialNumber` (`SerialNumber`,`ArrivalDate`,`ItemCode`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8

Here are the sample insert queries.

insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('1','345618761','23','25','ACC','ABCD','2021-02-14 12:24:44','1613371259','37','RX209',NULL);
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('2','345618761','24','25','RAF','ABCD','2021-02-15 12:55:17','1613373031','45','GA317',NULL);
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('3','569014575','43','21','DAT','TPRS','2021-02-13 12:56:34','1613373082','34','PX452',NULL);

There could be many many entries for same SerialNumber and ItemCodein Different ArrivalDate

Initially there were three indexes

1. id => Primary
2. SerialNumber
3. ArrivalDate

Following are the queries i run against this table.

SELECT * 
FROM ProductCatalog  
WHERE SerialNumber='1234567890' 
  AND ItemCode!="ABCD" 
ORDER BY id DESC LIMIT 1; //This Query Seems slower
    
SELECT BasePrice 
FROM ProductCatalog 
WHERE SerialNumber='123456789' 
  AND ItemCode!="ABCD" 
  and ItemCode!="PQRS" 
  AND ItemCode!="MNOP" 
ORDER BY ID Desc LIMIT 1 //This Query Seems Slower
    
SELECT * 
FROM ProductCatalog 
WHERE SerialNumber='123456789' 
  AND (ArrivalDate>='2019-01-01 00:00:00' AND ArrivalDate<='2020-12-31 23:59:59') 
  AND ItemCode='ABCD' 
ORDER BY ArrivalDate ASC //This query looks ok

SELECT BatchCode
FROM ProductCatalog 
WHERE SerialNumber='123456789' 
  AND ItemCode!="ABCD" 
  and ItemCode!="PQRS" 
  AND ItemCode!="MNOP" 
ORDER BY ID Desc LIMIT 1 //This Query Seems Slower

Then I changed the indexes such that we have only two indexes now. The primary and the composite one.

1. id => Primary
2. SerialNumber, ArrivalDate, ItemCode

Mysql Information

MySQL Version 5.7
Engine: InnoDB

Problem

The results are still not that satisfactory.

  1. Are the indexes I changed correct to get performance gain?
  2. Are the order of columns in index Correct?
  3. Column SerialNumber contains 16 digit numeric value, shall i changed it to int instead of varchar to gain performance?
Was it helpful?

Solution

"billions of records" and id int(10) unsigned NOT NULL AUTO_INCREMENT -- Beware! That will top out at about 4 billion.

To speed up the inserts, see if you can batch them something liek this:

insert into `productcatalog`
    (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`,
        `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`)
    values
    ('1','345618761','23','25','ACC','ABCD','2021-02-14 12:24:44','1613371259','37','RX209',NULL),
    ('2','345618761','24','25','RAF','ABCD','2021-02-15 12:55:17','1613373031','45','GA317',NULL),
    ('3','569014575','43','21','DAT','TPRS','2021-02-13 12:56:34','1613373082','34','PX452',NULL);

For the SELECTs, then indexes (with the columns in the order specified) should help:

INDEX(SerialNumber, ID, ItemCode, BasePrice)  -- for queries 1,2
INDEX(ItemCode, SerialNumber, ArrivalDate)    -- for query 3

Query 1 will (I think) use SerialNumber, ID from the first index and stop after one or two rows.

Query 2 is "covering", so even it needs to scan several rows, it would have to bounce between the index's BTree and the data's BTree.

Query 3 -- The = columns need to be first and the "range" column (Arrival Date) last.

The following changes won't improve performance; I just think they are clearer. Instead of

AND ItemCode!="ABCD" 
AND ItemCode!="PQRS" 
AND ItemCode!="MNOP" 

AND (ArrivalDate>='2019-01-01 00:00:00' AND ArrivalDate<='2020-12-31 23:59:59') 

I prefer these:

AND ItemCode NOT IN ("ABCD", "PQRS", "MNOP")

AND ArrivalDate >= '2019-01-01'
AND ArrivalDate  < '2019-01-01' + INTERVAL 1 YEAR

The following will explain some of my suggestions: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

This seems odd for a "price": decimal(4,1)

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top