Question

Let's say I have a User which has a status and the user's status can be 'active', 'suspended' or 'inactive'.

Now, when creating the database, I was wondering... would it be better to have a column with the string value (with an enum type, or rule applied) so it's easier to both query and know the current user status or are joins better and I should join in a UserStatuses table which contains the possible user statuses?

Assuming, of course statuses can not be created by the application user.

Edit: Some clarification

  1. I would NOT use string joins, it would be a int join to UserStatuses PK
  2. My primary concern is performance wise
  3. The possible status ARE STATIC and will NEVER change
Was it helpful?

Solution

On most systems it makes little or no difference to performance. Personally I'd use a short string for clarity and join that to a table with more detail as you suggest.

create table intLookup
(
pk integer primary key,
value varchar(20) not null
)
insert into intLookup (pk, value) values
(1,'value 1'),
(2,'value 2'),
(3,'value 3'),
(4,'value 4')

create table stringLookup
(
pk varchar(4) primary key,
value varchar(20) not null
)

insert into stringLookup (pk, value) values
(1,'value 1'),
(2,'value 2'),
(3,'value 3'),
(4,'value 4')



create table masterData
(
stuff varchar(50),
fkInt integer references intLookup(pk),
fkString varchar(4)references stringLookup(pk)
)
create index i on masterData(fkInt)
create index s on masterData(fkString)

insert into masterData
(stuff, fkInt, fkString)
select COLUMN_NAME, (ORDINAL_POSITION %4)+1,(ORDINAL_POSITION %4)+1 from INFORMATION_SCHEMA.COLUMNS
go 1000

This results in 300K rows.

select 
*
from masterData m inner join intLookup i on m.fkInt=i.pk

select 
*
from masterData m inner join stringLookup s on m.fkString=s.pk

On my system (SQL Server) - the query plans, I/O and CPU are identical - execution times are identical. - The lookup table is read and processed once (in either query)

There is NO difference using an int or a string.

OTHER TIPS

I think, as a whole, everyone has hit on important components of the answer to your question. However, they all have good points which should be taken together, rather than separately.

  1. As logixologist mentioned, a healthy amount of Normalization is generally considered to increase performance. However, in contrast to logixologist, I think your situation is the perfect time for normalization. Your problem seems to be one of normalization. In this case, using a numeric key as Santhosh suggested which then leads back to a code table containing the decodes for the statuses will result in less data being stored per record. This difference wouldn't show in a small Access database, but it would likely show in a table with millions of records, each with a status.

  2. As David Aldridge suggested, you might find that normalizing this particular data point will result in a more controlled end-user experience. Normalizing the status field will also allow you to edit the status flag at a later date in one location and have that change perpetuated throughout the database. If your boss is like mine, then you might have to change the Status of Inactive to Closed (and then back again next week!), which would be more work if the status field was not normalized. By normalizing, it's also easier to enforce referential integrity. If a status key is not in the Status code table, then it can't be added to your main table.

  3. If you're concerned about the performance when querying in the future, then there are some different things to consider. To pull back status, if it's normalized, you'll be adding a join to your query. That join will probably not hurt you in any sized recordset but I believe it will help in larger recordsets by limiting the amount of raw text that must be handled. If your primary concern is performance when querying the data, here's a great resource on how to optimize queries: http://www.sql-server-performance.com/2007/t-sql-where/ and I think you'll find that a lot of the rules discussed here will also apply to any inclusion criteria you enforce in the join itself.

Hope this helps!

Christopher

The whole idea behind normalization is to keep the data from repeating (well at least one of the concepts).

In this case there is only 1 status a user at one time (I assume) can have so their is no reason to put it in its own table. You would simply complicate things. The only reason you would have a seperate table is if for some reason these statuses were not static. Meaning next month you may add "Sort of Active" and "Maybe Inactive". This would mean changing code to make up for that if you didnt put them in their own table. You could create a maintenace page where users could add statuses and then that would require you to create a seperate table.

An issue to consider is whether these status values have attributes of their own.

For example, perhaps you would want to have a default sort order that is different from the alphabetical order of the status text. You might also want to treat two of the statuses in a particular way that you do not treat the other, and that could be an attribute.

If you have a need for that, or suspect a future need for that, then move the status text to a different table and use an integer key value for them.

I would suggest using Integer values like 0, 1, 2. If this is fixed. When interpreting the results in Reports we can change these status back to strings.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top