database two column logical OR index, or create separate 'index' table
-
13-12-2019 - |
Question
I have this following table:
Matches -> match_id, team_a_id , team_b_id, score
This table will record matches between two teams (team A and team B). However, sometimes team A play as the host and sometimes team B plays as the host. Therefore, when I tried to find history matches between team a and team b. What I currently I am doing is to
select * from matches where (team_a_id = 1 and team_b_id = 2) or (team_a_id = 2 and team_b_id = 1);
Is there any better approach to such case? As for the query above, am I right to include index of combination team_a_id and team_b_id? But even so, then I still have a Logical OR condition between AB OR BA.
Alternatively, I have another idea, that is to have another table let say history
History -> team_hash, match_id
I manually build team_hash where hash(a,b) == hash(b,a)
. But this result in slightly slower insert but faster read. Or is it really faster read?
Solution
Assuming there is a composite index on {team_a_id, team_b_id}
, the DBMS can execute your SQL statement using only two index seeks (one for the team_a_id = 1 and team_b_id = 2
and the other for the team_a_id = 2 and team_b_id = 1
), which is very fast. I don't expect you should find the performance lacking.
However, there is a way to eliminate one of these index seeks. Add a constraint...
CHECK(team_a_id < team_b_id)
...and encode a "direction" (i.e. which team is host) in a separate field if necessary. This way, you know team_a_id = 2 and team_b_id = 1
can never be true, so you only need to search on team_a_id = 1 and team_b_id = 2
.
"Symmetrical" hashing is a neat idea, but:
- The correctness of the hash cannot be enforced declaratively - you'll need to do it through a trigger or at the application level.
- It's a redundant data. You'll need to keep
team_a_id
andteam_b_id
anyway to resolve hashing conflicts. Larger data effectively means smaller cache. - It may actually increase the number of indexes - the efficient enforcement of the referential integrity will probably require indexes on
team_a_id
andteam_b_id
even if you don't need them for the actual SQL query. In addition to putting more pressure on cache, every additional index must be maintained, potentially hurting the INSERT/UPDATE/DELETE performance. The situation is especially serious in InooDB where you cannot turn-off the clustering, so secondary indexes tend to be more expensive than in heap-based tables (see the "Disadvantages of clustering" in this article).
OTHER TIPS
You can also make your WHERE clause something like this
((team_a_id = 1 and team_b_id = 2) or (team_a_id = 2 and team_b_id = 1))
AND team_a_id IN (1,2) AND team_b_id IN (1,2)
this way it will be possible to use an index like (team_a_id,team_b_id).