سؤال

I have a table with 17.6 million rows in a MyISAM database.

I want to searh an article number in it, but the result can not depend on special chars as dot,comma and others.

I'm using a query like this:

 SELECT * FROM `table`
 WHERE 
 replace(replace(replace( replace( `haystack` , ' ', '' ),
 '/', '' ), '-', '' ), '.', '' )
 LIKE 'needle'

This method is very-very slow. table has an index on haystack, but EXPLAIN shows query can not use that, That means query must scan 17.6 million rows - in 3.8 sec.

Query runs in a page multiple times (10-15x), so the page loads extremly slow.

What should i do? Is it a bad idea to use replace inside the query?

هل كانت مفيدة؟

المحلول

As you do the replace on the actual data in the table, MySQL can't use the index, as it doesn't have any indexed data of the result of the replace which it needs to compare to the needle.

That said, if your replace settings are static, it might be a good idea to denormalize the data and to add a new column like haystack_search which contains the data with all the replaces applied. This column could be filled during an INSERT or UPDATE. An index on this column can then effectively be used.

Note that you probably want to use % in your LIKE query as else it is effectively the same as a normal equal comparison. Now, if you use a searchterm like %needle% (that is with a variable start), MySQL again can't use the index and falls back to a table scan as it only can use the index if it sees a fixed start of the search term, i.e. something like needle%.

So in the end, you might end up having to tune your database engine so that it can held the table in memory. Another alternative with MyISAM tables (or with MySQL 5.6 and up also with InnoDB tables) is to use a fulltext index on your data which again allows rather efficient searching.

نصائح أخرى

It's "bad" to apply functions to the column as it will force a scan of the column.

Perhaps this is a better method:

SELECT list
     , of
     , relevant
     , columns
     , only
FROM   your_table
WHERE  haystack LIKE 'two[ /-.]needles'

In this scenario we are searching for "two needles", where the space between the words could be any of the character within the square brackets i.e. "two needles", "two/needles", "two-needles" or "two.needles".

You could try using LENGTH on the column, not sure if it gives a better affect. Also, when using LIKE you should use the %

SELECT * FROM `table`
WHERE 
haystack LIKE 'needle%' AND
LENGTH(haystack) - LENGTH(REPLACE(haystack,'/','')) = 0 AND
LENGTH(haystack) - LENGTH(REPLACE(haystack,'-','')) = 0 AND
LENGTH(haystack) - LENGTH(REPLACE(haystack,'.','')) = 0;

If the haystack is exactly needle then do this

SELECT * FROM `table`
WHERE 
haystack='needle';
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top