Question

I'm very new to databases and I'm a novice to data abstraction, coming from Java. To teach myself, I'm working on an online app that will, among other things, allow users to be part of multiple groups.

Sketching out the database, it seems I'll have to have something like a "Membership" table:

UserID|GroupID
------|-------
  1   |   1
  1   |   2
  2   |   1
  2   |   3
  2   |   5

I feel a little wary of this, since it's only two foreign keys, and only serves to link two objects. Is this standard practice for this kind of relationship? If not, what is the preferred method?

Again, I'm very new to databases. My book doesn't mention this kind of situation, so if there's some keyword that reflects this function that I overlooked...

Thank you.

Was it helpful?

Solution

That's a standard way for representing many-to-many relationship and is known as "junction table" (or "link table").

You already noted that both UserID and GroupID are foreign keys that reference other tables. But when it comes to keys (not foreign keys), you have several options:

  1. Make a composite (primary) key on {UserID, GroupID}. Beside ensuring same user cannot be connected to same group multiple times, it also facilitates the efficient search for groups of the given user. Since UserID is at the leading edge of the index (that the DBMS automatically creates under the key), all GroupID values associated to the same UserID are in a continuous range within the index B-tree, so getting groups of the given user can be done by the DBMS through a simple index range scan.
  2. Make a composite (primary) key on {GroupID, UserID}. Same fields, opposite order. This facilitates quickly getting users of the given group (i.e. querying in the opposite "direction" compared to (1)).
  3. Make a key on {UserID, GroupID} and (unique) index on {GroupID, UserID} (or vice-versa). This is useful if you need to query in both directions: getting groups of given user and getting users of given group, respectively.
  4. Do (1) or (2) or (3) above, but also make a surrogate key (e.g. {UserGroupID}). This may be useful if you have "child" tables that reference the junction table, and you want to streamline the size of the key that is being migrated to them through foreign keys. It may also be useful if your ORM tool doesn't work well with composite keys.

If you decide for options (1) or (2), cluster the table (if your DBMS supports it). Since you are only doing index range scans anyway, there is no need for the table heap to exist at all. You should even consider clustering for (3), since both indexes are covering so there is no danger of double-lookup.

OTHER TIPS

This kind of set up is perfectly common and perfectly fine.

I would add a primary key index on an auto-increment column as well, and also make sure to have indices on userid and groupid.

If you know how your application will use the data and you can leverage compound (AKA composite AKA multi-column AKA...) indices, do that instead of single column indices on userid and groupid.

You can learn more about multi-column indices here: http://www.mysqlperformanceblog.com/2009/09/19/multi-column-indexes-vs-index-merge/

and

http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top