Question

I have seen many posts on building database schemas for object tagging (such as dlamblin's post and Artilheiro's post as well).

What I cannot seem to find in my many days of research is the schema logic in implementing a tagging schema that allows for the tags to be assigned to a user (such as LinkedIn's Skills and Expertise system, where tags that have been added by the user can be indexed and searched). This could be as simple as changing the "object" in question to a user, but I have a feeling it is more complicated than that.

I want to be able to construct something almost exactly like this, except in categories. In example, if we took some of LinkedIn's skills and categorized them, we could have something like: IT/Computing, Retail, Project Management, etc.

I know there are a couple common methodologies and architectures to categorizing data, specifically Nested Set and Adjacency List. I have heard many things about both, such as "Nested Set's insertion and deletion are resource intensive", and "Adjacency List Models are awkward, finite, and don't cover unlimited depth."

So I have two questions wrapped into one post:

  • What would a rough example schema look like in regards to tagging skills to users, where they can be indexed and searched, or even be able to construct a pool of users for a specific tag?

  • What is the best to way to categorize something of this nature in light of the necessity to have categorization?

  • Are there any other models that would suit this better that I am unaware of? (Oops, I think that is three questions)

Was it helpful?

Solution

What is the best to way to categorize something of this nature in light of the necessity to have categorization?

Depends how much flexibility you need. For example, the adjacency list may be perfectly fine if you can assume the depth of your category hierarchy has a fixed limit of, say 1 or 2 levels.

Are there any other models that would suit this better that I am unaware of?

Path enumeration is a way to represent hierarchy in a concatenated list of the ancestor names. So each sub-category tag would name not only its own name, but its parent and any further grandparents up to the root.

You are already familiar with absolute pathnames in any shell environment: "/usr/local/bin" is a path enumeration of "usr", "local", and "bin" with the hierarchical relationship between them encoded in the order of the string.

This solution also has the possibility of data anomalies -- it's your responsibility to create an entry for "/usr/local" as well as "/usr/local/bin" and if you don't, some things start breaking.

What would a rough example schema look like in regards to tagging skills to users, where they can be indexed and searched, or even be able to construct a pool of users for a specific tag?

Implementing this in the database is almost as simple as naming the tags individually, but it requires that your tag "name" column be long enough to store the longest path in the hierarchy.

CREATE TABLE taguser (
 tag_path VARCHAR(255),
 user_id INT,
 PRIMARY KEY (tag_path,user_id),
 FOREIGN KEY (tag_path) REFERENCES tagpaths (tag_path),
 FOREIGN KEY (user_id) REFERENCES users (user_id)
);

Indexing is exactly the same as simple tagging, but you can only search for sub-category tags if you specify the whole string from the root of the hierarchy.

SELECT user_id FROM taguser WHERE tag_path = '/IT/Computing'; -- uses index

SELECT user_id FROM taguser WHERE tag_path LIKE '%/Computing'; -- can't use index

OTHER TIPS

I think the best logic is the same as state in the post you linked

+------- +
| user   |
+------- +
| userid |
| ...    |
+--------+

+-------- --+
| linktable |
+-----------+
| userid    | <- (fk and pk)
| tagid     | <- (fk and pk)
+-----------+

+-------+
| tag   |
+-------+
| tagid |
| ...   |
+-------+

pretty mutch the way go to imo. if you want to categorise the tag you can alway atach a category table to the tag table

You didn't say which database, so I'm going to play devil's advocate and suggest how it would work in MongoDB. Create your user like this:

db.users.insert({
  name: "bob",
  skills: [ "surfing", "knitting", "eating"]
})

Then create an index on "skills". Mongo will add each skill in the array to the index, allowing quick lookups. Finding users with an intersection of 2 skills has similar performance to SQL databases, but the syntax is much nicer:

db.users.find({skills: "$in": ["surfing", "knitting"]})

The upside is that a single disk seek will fetch all the information you need on a user. The downside is that it takes a lot more disk space, and somewhat more RAM. But if it can avoid disk seeks caused by joins, it could be a win.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top