Question

I'm using this query to perform a full text search on a MySQL database:

SELECT DISTINCT 
questions.id, 
questions.uniquecode, 
questions.spam,
questions.questiondate,
questions.userid,
questions.description,
users.login AS username,
questions.questiontext,
questions.totalvotes,
MATCH(questions.questiontext, questions.uniquecode) 
AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) AS relevance 

FROM questions 

LEFT JOIN users ON questions.userid = users.id 
LEFT JOIN answer_mapping ON questions.id = answer_mapping.questionid 
LEFT JOIN answers ON answer_mapping.answerid = answers.id
LEFT JOIN tagmapping ON questions.id = tagmapping.questionid
LEFT JOIN tags ON tagmapping.tagid = tags.id 

WHERE questions.spam < 10 

AND 

(
  MATCH(questions.questiontext, questions.uniquecode) 
  AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) 

OR MATCH(answers.answertext) AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) 

OR MATCH (tags.tag) AGAINST ('rock guitarist chick*' IN BOOLEAN MODE)

) GROUP BY questions.id ORDER BY relevance DESC

The results are very relevant, but the search is really slow and is getting slower and slower as the tables grow.

Table stats:

questions - 400 records

indexes

  • PRIMARY BTREE - id
  • BTREE - uniquecode
  • BTREE - questiondate
  • BTREE - userid
  • FULLTEXT - questiontext
  • FULLTEXT - uniquecode

answers - 3,635 records

indexes

  • PRIMARY - BTREE - id
  • BTREE - answerdate
  • BTREE - questionid
  • FULLTEXT - answertext

answer_mapping - 4,228 records

indexes

  • PRIMARY - BTREE - id
  • BTREE - answerid
  • BTREE - questionid
  • BTREE - userid

tags - 1,847 records

indexes

  • PRIMARY - BTREE - id
  • BTREE - tag
  • FULLTEXT - tag

tagmapping - 3,389 records

indexes

  • PRIMARY - BTREE - id
  • BTREE - tagid
  • BTREE - questionid

For whatever reason when I remove the tagmapping and tags JOINS the search speeds up considerably.

Do you have any tips on how to speed this query up?

Thanks in advance!

Was it helpful?

Solution

well you could combine your join into a cached view or extra table or something. have your query cache active and define your join as an select so it can be cached. ensure enough memory etc. but that shouldn't be the bottleneck. well probably in your case it is because... only 400 records? thats nothing... and already slow? because the rest looks good. what sort of hardware/configuration are you running?

but well, i think this is the wrong approach. mysql isnt designed for that. in fact fulltext feature is limited to myisam.

you should consider using lucene/solr using the dismax request handler. it should give you good results in about 50ms-100ms with an index of some hundret thousand documents. at some point you can shard it so the number of records is pratically unlimited. plus you have better options and can achieve better results. for example do fuzzy matching or give more weight to newer documents or have tags more relevant than title, do post query analyzation, facetting, etc...

OTHER TIPS

You might also try to run OPTIMIZE TABLE questions

It helped speed up a similar query in a project I'm working on.

See reference: https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html

Your formulation of the query works slowly for multiple reasons, but I am unsure of the details. Please provide EXPLAIN FORMAT=JSON SELECT ... for further discussion.

Meanwhile, let's rewrite the query in a way that should work faster. (And it might get rid of a bug you have not yet encountered.)

First, let's build an debug this. It does the 3 FT searches in 3 separate queries, then combines (UNION) just the question_ids from each.

    ( SELECT question_id,
         MATCH (... ) as relevance
         FROM questions
         WHERE MATCH (questiontext, ...) AGAINST ... )
    UNION ALL
    ( SELECT am.question_id,
         MATCH (... ) as relevance
         FROM answers AS a
         JOIN answer_mapping AS am ON am.answerid = a.id
         WHERE MATCH (a.answertext) AGAINST ... )
    UNION ALL
    ( SELECT tm.question_id,
         MATCH (... ) as relevance
         FROM tags AS t
         JOIN tagsmapping tm ON ...
         WHERE MATCH (t.tag) AGAINST ... )

Notice how each subquery is designed to start with the table with the FT index and end up with question_id.

Now, an intermediate query:

SELECT question_id,
         MAX(relevance)  -- (this fixes the unseen bug)
    FROM ( that query ) AS q1
    GROUP BY question_id
    ORDER BY relevance DESC  -- optional; needed for `LIMIT`
    LIMIT 20          -- to limit the rows, do it at this stage

If that works out fast enough, and provides the "correct" question_ids, then we can proceed...

Use that as a subquery to get to the rest of the data:

SELECT .... -- the `questions` fields, using `q....`,
       ( SELECT login FROM users WHERE q.userid = id ) AS username
    FROM ( the intermediate query ) AS q2
    JOIN questions AS q
    questions q.spam < 10 
    ORDER BY q2.relevance

Yes, this is JOINing back to questions, but that turns out to be faster.

Note that the GROUP BY is not neded here. And, if the inner query had LIMIT, it won't be needed here.

I apologize if I did not quite get everything right; there were more transformations than I expected.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top