database two column logical OR index, or create separate 'index' table

https://stackoverflow.com//questions/11712344

13-12-2019
|

Question

I have this following table:

Matches -> match_id, team_a_id , team_b_id, score

This table will record matches between two teams (team A and team B). However, sometimes team A play as the host and sometimes team B plays as the host. Therefore, when I tried to find history matches between team a and team b. What I currently I am doing is to

select * from matches where (team_a_id = 1 and team_b_id = 2) or (team_a_id = 2 and team_b_id = 1);

Is there any better approach to such case? As for the query above, am I right to include index of combination team_a_id and team_b_id? But even so, then I still have a Logical OR condition between AB OR BA.

Alternatively, I have another idea, that is to have another table let say history

History -> team_hash, match_id

I manually build team_hash where hash(a,b) == hash(b,a). But this result in slightly slower insert but faster read. Or is it really faster read?

Solution

Assuming there is a composite index on {team_a_id, team_b_id}, the DBMS can execute your SQL statement using only two index seeks (one for the team_a_id = 1 and team_b_id = 2 and the other for the team_a_id = 2 and team_b_id = 1), which is very fast. I don't expect you should find the performance lacking.

However, there is a way to eliminate one of these index seeks. Add a constraint...

CHECK(team_a_id < team_b_id)

...and encode a "direction" (i.e. which team is host) in a separate field if necessary. This way, you know team_a_id = 2 and team_b_id = 1 can never be true, so you only need to search on team_a_id = 1 and team_b_id = 2.

"Symmetrical" hashing is a neat idea, but:

The correctness of the hash cannot be enforced declaratively - you'll need to do it through a trigger or at the application level.
It's a redundant data. You'll need to keep team_a_id and team_b_id anyway to resolve hashing conflicts. Larger data effectively means smaller cache.
It may actually increase the number of indexes - the efficient enforcement of the referential integrity will probably require indexes on team_a_id and team_b_id even if you don't need them for the actual SQL query. In addition to putting more pressure on cache, every additional index must be maintained, potentially hurting the INSERT/UPDATE/DELETE performance. The situation is especially serious in InooDB where you cannot turn-off the clustering, so secondary indexes tend to be more expensive than in heap-based tables (see the "Disadvantages of clustering" in this article).

OTHER TIPS

You can also make your WHERE clause something like this

((team_a_id = 1 and team_b_id = 2) or (team_a_id = 2 and team_b_id = 1))
AND team_a_id IN (1,2) AND team_b_id IN (1,2)

this way it will be possible to use an index like (team_a_id,team_b_id).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow