Question

RDBMS: MariaDB 10.3

Let's say you have a table which stores specific information (info_1, info_2, etc.) according to a given pair of usernames; like so:

+----------+----------+----------+----------+----------+------+--..
|  user_1  |  user_2  |  info_1  |  info_2  |  info_3  |  ..  |  ..
+----------+----------+----------+----------+----------+------+--..

where the values of user_1 and user_2 each equal a specific & distinct username (VARCHAR(50)), and are defined as multi-column index (user_1,user_2) of the table upon creation. The queries my app does search in function of:

A) a given user pair:

WHERE user_1 = name_1 AND user_2 = name_2 OR user_1 = name_2 AND user_2 = name_1

B) a given user / return all the records where a specific user is one of the members:

WHERE user_1 = name_1 OR user_1 = '%' AND user_2 = name_1

Problem: I must make sure that searches are always indexed. When using an OR clause, a full table scan is however executed each time.

SOLUTION I)

Thanks to this answer, possible queries resulting in indexed searches for these cases is:

A)

SELECT * FROM my_table WHERE user_1 = name_1 AND user_2 = name_2
UNION ALL
SELECT * FROM my_table WHERE user_1 = name_2 AND user_2 = name_1;

B)

SELECT * FROM my_table WHERE user_1 = name_1
UNION ALL
SELECT * FROM my_table WHERE user_1 = '%' AND user_2 = name_1;

However with the drawback of always using two SELECT statements and one WHERE clause which will always be in vain (as there will always be one record per user-pair). That's why I attempted to further optimize the query by using a fulltext index, which resulted in the following table:

SOLUTION II)

+---------+----------+----------+----------+------+--..
|  users  |  info_1  |  info_2  |  info_3  |  ..  |  ..
+---------+----------+----------+----------+------+--..

where users has been defined as FULLTEXT VARCHAR(150), and its value always corresponds to the format name_1-name_2, whereas I used a hyphen because a comma is used in the MATCH...AGAINST syntax to execute fulltext index queries.

With this structure, I now get the following new query possibilities for A) and B):

A)

SELECT * FROM my_table WHERE MATCH(users) AGAINST ('"name_1-name_2","name_2-name_1"');

B)

SELECT * FROM my_table WHERE MATCH(users) AGAINST ('name_1');

Which in my opinion drastically facilitate the query structure, and use neither two SELECT statements, nor an additional WHERE clause which will be in vain. Still, as I don't know a lot yet about query performance optimization and the comparison of fulltext index searches with regular index searches, I wondered which solution of I and II may perform better, and why ? Or is there even a better approach?

Was it helpful?

Solution

Critique

1A cannot use indexes because of OR

1B -- user_1 = '%' prevents the use of INDEX(user_1, ...), so solution 1B is out.

1B can be salvaged by having a second index: INDEX(user_2). Then simply say

SELECT * FROM my_table WHERE user_1 = name_1
UNION ALL
SELECT * FROM my_table WHERE user_2 = name_1;

2-- FULLTEXT has limitations, such as "word" length. So be cautious.

2A (WHERE MATCH(users) AGAINST ('"name_1-name_2","name_2-name_1"') -- it is probably sufficient to say simply WHERE MATCH(users) AGAINST ('+name_1 +name_2' IN BOOLEAN MODE); The +` says that both are required, but in any order.

2B -- OK (with the above caveats)

FULLTEXT, when it is applicable, is likely to be very performant. Here are two tricks I have used:

WHERE MATCH(text) AGAINST("R +James" IN BOOLEAN MODE)

The short 'word' (R) will be ignored; it will search for James efficiently. Note the selective use of '+'.

WHERE MATCH(text) AGAINST("R +Anders" IN BOOLEAN MODE)
  AND user LIKE "%R. Anders%"

This assumes that the name might be in the middle of a large column, and I need to check the initial. The FT test will be first (and fast), followed by the LIKE (slow, but against few rows).

That still has problems because of "E. Anders and R. Anderson" would be wrongly caught.

(Etc, etc. But I digress from your simple 2-word use case.)

Bottom line:

1B (extra index) is optimal. (But I suspect you watered down the Question.)

FULLTEXT would be my next choice.

Let me add an option 3:

Sort the users before inserting and when querying. That is

INSERT ... (user1, user2)
   VALUES
   (LEAST(?, ?), GREATEST(?, ?))

then you need (for case A) only INDEX(user1, user2) together with

WHERE user1 = LEAST(?, ?)
  AND user1 = GREATEST(?, ?)

Alas, that will probably mess up case B. (Do all rows have 2 user names, but some queries have only 1 user?)

And Option 4

Have another table that maps user names to my_table by id. Typically 2 rows in this new table for each row in my_table. Optionally toss the user columns from my_table.

I won't go into the rest of the detail; they do get messy.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top