Fulltext “title” search in 1M rows
-
26-04-2021 - |
Question
There is a 3.3GB articles
Myisam table with following fields: id, title, perma, body, date
primary key: id
fulltext index: title
It has 1,110,000 rows. After I did this:
SET GLOBAL key_buffer_size = 2000*1024*1024; LOAD INDEX INTO CACHE articles INDEX(title);
I can't get enough performance.
You can see execution times of several samples below:
<9.5381848812103>
SELECT SQL_NO_CACHE perma,title,body, MATCH(title) AGAINST('flowers for children' IN BOOLEAN MODE) AS sort
FROM articles WHERE MATCH(title) AGAINST('flowers for children' IN BOOLEAN MODE) ORDER BY sort DESC LIMIT 30;
<12.734259843826>
SELECT SQL_NO_CACHE perma,title,body, MATCH(title) AGAINST('how to play basketball' IN BOOLEAN MODE) AS sort
FROM articles WHERE MATCH(title) AGAINST('how to play basketball' IN BOOLEAN MODE) ORDER BY sort DESC LIMIT 30;
<4.4655818939209>
SELECT SQL_NO_CACHE perma,title,body, MATCH(title) AGAINST('kill a bird and eat it' IN BOOLEAN MODE) AS sort
FROM articles WHERE MATCH(title) AGAINST('kill a bird and eat it' IN BOOLEAN MODE) ORDER BY sort DESC LIMIT 30;
<16.268588066101>
SELECT SQL_NO_CACHE perma,title,body, MATCH(title) AGAINST('avoid back pain' IN BOOLEAN MODE) AS sort
FROM articles WHERE MATCH(title) AGAINST('avoid back pain' IN BOOLEAN MODE) ORDER BY sort DESC LIMIT 30;
<12.553371906281>
SELECT SQL_NO_CACHE perma,title,body, MATCH(title) AGAINST('computer' IN BOOLEAN MODE) AS sort
FROM articles WHERE MATCH(title) AGAINST('computer' IN BOOLEAN MODE) ORDER BY sort DESC LIMIT 30;
Any suggestions to make the execution time better?
Solution
Here is your first query
SELECT SQL_NO_CACHE perma,title,body,
MATCH(title) AGAINST('flowers for children' IN BOOLEAN MODE) AS sort
FROM articles
WHERE MATCH(title) AGAINST('flowers for children' IN BOOLEAN MODE)
ORDER BY sort DESC LIMIT 30;
You might need to refactor this
First, pick up the keys and the sort value
SELECT id,MATCH(title) AGAINST ('flowers for children' IN BOOLEAN MODE) sort
FROM articles
WHERE MATCH(title) AGAINST ('flowers for children' IN BOOLEAN MODE);
This query will result in a 800M temp table
Next, limit it to 30 highest sort values
SELECT * FROM
(
SELECT id,MATCH(title) AGAINST ('flowers for children' IN BOOLEAN MODE) sort
FROM articles
WHERE MATCH(title) AGAINST ('flowers for children' IN BOOLEAN MODE)
) AA
ORDER BY sort DESC LIMIT 30;
OK now a 720 byte temp table
Finally, LEFT JOIN
those 30 rows to the articles
table
SELECT
B.perma,B.title,B.body,A.sort
FROM
(
SELECT * FROM
(
SELECT id,MATCH(title) AGAINST ('flowers for children' IN BOOLEAN MODE) sort
FROM articles
WHERE MATCH(title) AGAINST ('flowers for children' IN BOOLEAN MODE)
) AA
ORDER BY sort DESC LIMIT 30
) A LEFT JOIN articles B USING (id);
Give it a Try !!!
OTHER TIPS
Try just using the default "IN NORMAL MODE" as opposed to "IN BOOLEAN MODE".
I'm not sure if there is any particular reason why you chose "IN BOOLEAN MODE". I noticed that you weren't using any of the operators that "IN BOOLEAN MODE" offers. Also, because of the large data set, you shouldn't have to worry about the 50% threshold.
I'm suggesting this because "IN NORMAL MODE" will also sort results by relevance without the need of "ORDER BY". That is according to the MySQL documentation.
Oh, and unless you have changed the minimum word size, the shortest words included in a "FULLTEXT INDEX" are 4 letters long. So the "for" in your "AGAINST(...)" would be unnecessary, unless the min word size was changed.