Question

Say I offer user to check off languages she speaks and store it in a db. Important side note, I will not search db for any of those values, as I will have some separate search engine for search. Now, the obvious way of storing these values is to create a table like

UserLanguages
(
 UserID nvarchar(50),
 LookupLanguageID int
)

but the site will be high load and we are trying to eliminate any overhead where possible, so in order to avoid joins with main member table when showing results on UI, I was thinking of storing languages for a user in the main table, having them comma separated, like "12,34,65"

Again, I don't search for them so I don't worry about having to do fulltext index on that column.

I don't really see any problems with this solution, but am I overlooking anything?

Thanks, Andrey

Was it helpful?

Solution

You might not be missing anything now, but when you're requirements change you might regret that decision. You should store it normalized like your first instinct suggested. That's the correct approach.

What you're suggesting is a classic premature optimization. You don't know yet whether that join will be a bottleneck, and so you don't know whether you're actually buying any performance improvement. Wait until you can profile the thing, and then you'll know whether that piece needs to be optimized.

If it does, I would consider a materialized view, or some other approach that pre-computes the answer using the normalized data to a cache that is not considered the book of record.

More generally, there are a lot of possible optimizations that could be done, if necessary, without compromising your design in the way you suggest.

OTHER TIPS

Don't.

  • You don't search for them now
  • Data is useless to anything but this one situation
  • No data integrity (eg no FK)
  • You still have to change to "English,German" etc for display
  • "Give me all users who speak x" = FAIL
  • The list is actually a presentation issue

It's your system, though, and I look forward to answering the inevitable "help" questions later...

This type of storage has almost ALWAYS come back to haunt me. For one, you are not even in first normal form. For another, some manager or the other will definitely come back and say.. "hey, now that we store this, can you write me a report on... "

I would suggest going with a normalized design. Put it in a separate table.

Problems:

  1. You lose join capability (obviously).
  2. You have to reparse the list on each page load / post back. Which results in more code client side.
  3. You lose all pretenses of trying to keep database integrity. Just imagine if you decide to REMOVE a language later on... What's the sql going to be to fix all of your user profiles?
  4. Assuming your various profile options are stored in a lookup table in the DB, you still have to run "30 queries" per profile page. If they aren't then you have to code deploy for each little change. bad, very bad.
  5. Basing a design decision on something that "won't happen" is an absolute recipe for failure. Sure, the business people said they won't ever do that... Until they think of a reason they absolutely must do it. Today. Which will be promptly after you finish coding this.
  6. As I stated in a comment, 30 queries for a low use page is nothing. Don't sweat it, and definitely don't optimize unless you know for darn sure it's necessary. Guess how many queries SO does for it's profile page?

I generally stay away at the solution you described, you asking for troubles when you store relational data in such fashion.

As alternative solution: You could store as one bitmasked integer, for example: 0 - No selection 1 - English 2 - Spanish 4 - German 8 - French 16 - Russian --and so on powers of 2

So if someone selected English and Russian the value would be 17, and you could easily query the values with Bitwise operators.

Premature optimization is the root of all evil.

EDIT: Apparently the context of my observation has been misconstrued by some - and hence the downvotes. So I will clarify.

Denormalizing your model to make things easier and/or 'more performant' - such as creating concatenated columns to represent business information (as in the OP case) - is what I refer to as a "premature optimization".

While there may be some extreme edge cases where there is no other way to get the necessary performance necessary for a particular problem domain - one should rarely assume this is the case. In general, such premature optimizations cause long-term grief because they are hard to undo - changing your data model once it is in production takes a lot more effort than when it initially deployed.

When designing a database, developers (and DBAs) should apply standard practices like normalization to ensure that their data model expresses the business information being collected and managed. I don't believe that proper use of data normalization is an "optimization" - it is a necessary practice. In my opinion, data modelers should always be on the lookout for models that could be restructured to (at least) third normal form (3NF).

If you're not querying against them, you don't lose anything by storing them in a form like your initial plan. If you are, then storing them in the comma-delimited format will come back to haunt you, and I doubt that any speed savings would be significant, especially when you factor in the work required to translate them back.

You seem to be extremely worried about adding in a few extra lookup table joins. In my experience, the time it takes to actually transmit the HTML response and have the browser render it far exceed a few extra table joins. Especially if you are using indexes for your primary and foreign keys (as you should be). It's like you are planning a multi-day cross-country trip and you are worried about 1 extra 10 minute bathroom stop.

The lack of long-term flexibility and data integrity are not worth it for such a small optimization (which may not be necessary or even noticeable).

Nooooooooooooooooo!!!!!!!!

As stated very well in the above few posts.

If you want a contrary view to this debate, look at wordpress. Tables are chocked full of delimited data, and it's a great, simple platform.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top