Question

I have a Mysql 5.6 table with 70 million rows in it, but it will grow to 100+ million rows or more in a few weeks.

I have a dedicated machine with a humble 500GB disk and 4GB RAM and the innodb_buffer_pool_size is set to 2GB.

The database uses 99% to selects and 1% to inserts (once a month).

The most important column is descripcion_detallada_producto varchar(300) and it is where the selects are aimed at in 90% of the times.

My table is:

    CREATE TABLE `t1` (
      `N_orden` bigint(20) NOT NULL DEFAULT '0',
      `Fecha` varchar(15) COLLATE latin1_spanish_ci DEFAULT NULL,
      `Ncm` int(11) NOT NULL,
      `Origen` int(11) NOT NULL,
      `Adquisicion` int(11) NOT NULL,
      `Medida_Estadistica` int(11) NOT NULL,
      `Unidad_Comercializacion` varchar(30) COLLATE latin1_spanish_ci DEFAULT NULL,
      `Descripcion_Detallada_Producto` varchar(300) COLLATE latin1_spanish_ci DEFAULT NULL,
      `Cantidad_Estadistica` double DEFAULT NULL,
      `Peso_Liquido_Kg` double DEFAULT NULL,
      `Valor_Fob` double DEFAULT NULL,
      `Valor_Frete` double DEFAULT NULL,
      `Valor_Seguro` double DEFAULT NULL,
      `Valor_Unidad` double DEFAULT NULL,
      `Cantidad` double DEFAULT NULL,
      `Valor_Total` double DEFAULT NULL,
      PRIMARY KEY (`N_orden`),
      KEY `Ncm` (`Ncm`),
      KEY `Origen` (`Origen`),
      KEY `Adquisicion` (`Adquisicion`),
      KEY `Medida_Estadistica` (`Medida_Estadistica`),
      KEY `Descripcion_Detallada_Producto` (`Descripcion_Detallada_Producto`),
      CONSTRAINT `t1_ibfk_1` FOREIGN KEY (`Ncm`) REFERENCES `ncm` (`Ncm`),
      CONSTRAINT `t1_ibfk_2` FOREIGN KEY (`Origen`) REFERENCES `paises` (`Codigo_Pais`),
      CONSTRAINT `t1_ibfk_3` FOREIGN KEY (`Adquisicion`) REFERENCES `paises` (`Codigo_Pais`),
      CONSTRAINT `t1_ibfk_4` FOREIGN KEY (`Medida_Estadistica`) REFERENCES `medida_estadistica` (`Codigo_Medida_Estadistica`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_spanish_ci;

My question: Today a SELECT query using LIKE '%whatever%' takes normally 5 to 7 minutes, sometimes more. From where I understand the varchar index just are used when 'whatever%' is used, but I NEED to have the possibility to search for strings using left and right wildcards without needing to wait ~7 minutes each search. How can I do it?

Was it helpful?

Solution 2

If you are searching for entire words that may be anywhere in a text column, you should consider using fulltext indexes, which are obviously used differently than wildcard searches. If you're unsure how to search your fulltext indexes, you can always get help with that.

Doing a search like the following will not use any of your indexes. Instead, it will scan through all rows of your table data, and you're subjected to disk reads (and any correlated disk fragmentation, which isn't usually a problem because we don't usually scan through tables):

SELECT * FROM t1
WHERE Descripcion_Detallada_Producto LIKE `%whatever%'

The following query would just scan through your index on Descripcion_Detallada_Producto which would act as a "covering" index (notice that the columns in the select make the difference):

SELECT N_orden FROM t1
WHERE Descripcion_Detallada_Producto LIKE `%whatever%'

The advantage in scanning an index instead of the actual table data is that the amount of data that is read as it scans is minimized, and ideally with a large innodb_buffer_pool_size, that index would be in memory, which would avoid disk seeks.

Once you get the N_orden values, then you could retrieve the individual records from the table data.

Additional Info

Consider reducing the size of the columns (bigint to unsigned int for N_orden) and reduce size of Descripcion_Detallada_Producto. Even though VARCHAR only uses up actual bytes (plus length) in the table data, each index entry actually uses the max, so reducing even a VARCHAR column size in an index will improve index scan speed.

In addition, if you have categories, restrict searches to selected categories and create a multi-column index on category+description. The following will only have to scan through a portion of a multi-column index on both category and description by restricting the search to a particular category:

SELECT N_orden FROM t1
WHERE Category = 1
  AND Descripcion_Detallada_Producto LIKE `%whatever%'

Finally, consider removing wildcard prefixes. Make the user at least type the beginning of the model number.

OTHER TIPS

The right way to fix the problem is to look at all the queries being run against the table, and their relative frequency. You've only given us part of one. You didn't even say which field it relates to. Since you do say "The most important column is descripcion_detallada_producto varchar(300) and it is where the selects are aimed at in 90% of the times" I'll assume that you only need to optimize

WHERE descripcion_detallada_producto LIKE '%wathever%'

As Vatev has already said, you probably should be using fulltext searches - which are sematically (and syntactically) different from LIKE predicates. Further you should be splitting the descripcion_detallada_producto attribute into it's own relation to reduce the buffer flushing effects of reading huge rows into memory from disk.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top