Question

I would like to make system whitch allows to search user messages, by specific user. assume having folowing table

create table messages(
  user_id int,
  message nvarchar(500));

So what kind of index I should use here, if I want to search for all messages from user 1, containing word 'foo'.

  1. Simple, non unique index user_id
    It will filter only specific user messages nd then full scan for specific word.
  2. FULLTEXT index on message
    this will find all messages from all users and then filter by ID, seems to be very inefficient in case of big amount of users.
  3. comopound index on both user_id and message
    So full text index tree is created for each user separately, so they can be searched individually. During query system filters messages by ID and then performs text search on remaining rows in index.

A.F.A.I.K. last one is impossible. So then I assume I shall use 1-st option, It will perform better in case of few thousands of users?

And if each will have ~100 messages, full iteration won't cost much resources?

Perhaps I can include username into message and use BOOLEAN full text search mode, but I think it would be slower than by using indexed user_id.

Was it helpful?

Solution

@Alden Quimby's answer is correct as far as it goes, but there is more to the story, because MySQL will only try to choose the optimal index, and its ability to make that determination is limited because of the way fulltext indexes interact with the optimizer.

What actually happens is this:

If the specified user_id exists in either 0 or 1 matching rows in the table, the optimizer will realize this and will choose user_id as the index for that query. Fast execution.

Otherwise, the optimizer will choose the fulltext index, filtering every row matched by the fulltext index to eliminate rows not containing a user_id that matches the WHERE clause. Not quite as fast.

So it's not truly the "optimum" path. It's more like fulltext, with a nice optimization to avoid the fulltext search under the one condition that we know we have almost nothing of interest in the table.

The reason this breaks down is that a fulltext index doesn't give any meaningful statistics back to the optimizer. It just says "yeah, I think that query should probably only require me to check 1 row" ... which, of course, pleases the optimizer greatly, so the fulltext index wins the bid for lowest cost, unless the index with the integer value also comes in comparably low or lower.

Still, that doesn't mean I wouldn't try it this way first.

There's another option, which would work best with fulltext queries IN BOOLEAN MODE and that is to create another column which you would populate with something like CONCAT('user_id_',user_id) or something similar, and then declare a 2-column fulltext index.

filter_string VARCHAR(48) # populated with CONCAT('user_id_',user_id);
....
FULLTEXT KEY (message,filter_string)

Then specify everything in the query.

SELECT ...
 WHERE user_id = 500 AND
 MATCH (message,filter_string) AGAINST ('+kittens +puppies +user_id_500' IN BOOLEAN MODE);

Now, the fulltext index will be responsible for matching only those rows where kittens, puppies, and "user_id_500" appears in the combined fulltext index of the two columns, but you'd still want to have the integer filter there too to make sure the final results are constrained in spite of any random appearance of "user_id_500" in the message.

OTHER TIPS

You should add a fulltext index on message and a regular index on user_id, and use the query:

SELECT *
FROM messages
WHERE MATCH(message) AGAINST(@search_query)
AND user_id = @user_id;

You're right that you can't do option 3. But rather than trying to pick between 1 and 2, let MySQL do the work for you. MySQL will only use one of the two indexes, and will do a linear scan to complete the second filter, but it will estimate the effectiveness of each index and choose the optimal one.

Note: only do this if you can afford the overhead of two indexes (slower insert/update/delete). Also, if you know that each user will only have a few messages, then yes it might make sense to use a simple index and do a regex in the application layer or something like that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top