Using VARCHAR as PRIMARY KEY for an 'ORPHAN' table

https://stackoverflow.com/questions/10195767

01-06-2021
|

Question

I'm to create an orphan table (no relationships with any other table whatsoever) that contains 3 columns.

Col1 - String field - VARCHAR(32) - Contains unique data not more than 32 characters
Col2 - String field - TEXT - Contains larger non-unique data of characters
Col3 - Numeric (Bool) - INT(1) - 0/1 for Flagging

I'm thinking of using Col1 as my PRIMARY KEY. I have done some research and see people argue that using a meaningless INT column as a PRIMARY KEY to avoid Foreign Key/Storage issues is the way to go.

However, IMO, since this is an orphan table, it should not matter. Besides, I would require to place an INDEX on Col1 anyway.

As a side note, I'm not expecting more than ~1000 rows in this table.

Thoughts please.

Solution

If col1 is your real primary key, there is no reason not to use it. Especially if the table is that tiny.

You would need to maintain a unique index on that column anyway, so by adding an artificial primary key you just add more overhead fon insert and delete operations (as two indexes must be maintained).

Unless you are referencing that PK from really, really many other rows (and other tables) you should just go with what is the natural primary for your business rules.

OTHER TIPS

I'd still just use an INT PK and put an index on COL1. I suppose you could use COL1 as the index if you can ensure that nothing will ever be joined to that table, but if nothing else the index will give you an idea of the order in which items are added/deleted from the table. I also like to add an IsActive boolean so that you never delete anything and a DateCreated datetime to almost every table.

I see where you are coming from but it just makes sense to index the first column anyway. It may be because I am used to excel but the usefulness of the initial column for a primary key also has an order to it along with readability while debugging or capturing data. If you use a more random generated number you still would be searching through a few hundred rows looking for a hard to distinguish key. In the end I highly recommend the extra column of ints. It is well worth it.

Whenever i do any database tables i keep my INT column. I believe its faster to compare numbers then strings.

So it all depends how ofter you will query the database for info and compare strings in there.

I'm still unclear what the question is. Judging from the answers, I've deduced it down to two plausible questions:

Is it okay to use a VARCHAR instead of an INTEGER as a primary key?
- Yes it is okay to use a VARCHAR instead.
  In many cases it is preferred, especially if your table is expected to grow beyond 2,147,483,647 records (yes this happens). Performance-wise, even if INTs had a minimal speed advantage, on a ~1000 record table, you would not see it. Designated PKs are indexed by default. The one problem is that you'll lose any auto-generating sequence that the database can do for you.
Is it okay to use your unique COL1 field as a primary key, instead of some other unique ID field?
- Yes it is okay.
  The whole notion of having a primary key is to establish a unique field. What you're losing, though, may be some intrinsic comprehension. When other users want to join on that table, it's far easier to understand that id is a unique field, whereas col1 (some varchar) may or may not be unique.

In your given scenario, it should be okay. If the scope does grow up then you can always introduce an auto_increment PK column. Just make sure that your field is both indexed and unique.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow