Question

If I have 10,000 users and the primary key is a unique ID going from 1 to 10,000, is there a way to give them all a unique ID such that the original primary key cannot be inferred from it?

For example, linking to your facebook profile or similar would be http://site.com/profile?id=293852

Is it likely that the id there is the same as the primary key of their user in the database? I am struggling to think of a way to have two unrelated unique ID columns, because randomly generated ones would have to be unique. I imagine if it were possible to have a GUID using numbers only the length would be far too long.

And ideas?

Was it helpful?

Solution

You have generally two options:

  1. As you said, use randomly generated data. (You only need to ensure they are unique, i.e. either long enough, or generate-verify-retry.)
  2. Get the primary key and transform it “pseudorandomly” to something else which seems to have nothing to do with the primary key. The transformation might be very simple (if you want just a mild protection), e.g. new Random(primaryKey).NextInt(), or it might be quite complicated, but attack-proof, e.g. any kind of Format-preserving encryption.

But then… why do you think you should protect the values of your primary keys? If the only reason is to prevent users guessing other valid user IDs, you can just append a random string to the primary key (and store it in the database and verify its correctness on access).

OTHER TIPS

It is really recommends in security reason to make ID non sequential, to avoid enumerating of user in system. But 4 billions (I mean 2^32) is too small to provide non-discoverable interval. That is why GUID is more preferable. Depending on database (looking at your spec it seems like MSSQL) you can store in guid-like fields, byte fields (for MySQL) or 2 separate int64.

To reduce URL size the base64 encoding can be applied so GUID looks shorter.

How you generate the random and unique ids is a useful question - but you seem to be making an assumption about when to generate them!

My point is that you do not need to generate these id's at the time of creating your rows, because they are essentially independent of the data being inserted.

What I do is pre-generate random id's for future use, that way I can take my own sweet time and absolutely guarantee they are unique, and there's no processing to be done at the time of the insert.

For example I have an orders table with order_id in it. This id is generated on the fly when the user enters the order, incrementally 1,2,3 etc forever. The user does not need to see this internal id.

Then I have another table - random_ids with (order_id, random_id). I have a routine that runs every night which pre-loads this table with enough rows to more than cover the orders that might be inserted in the next 24 hours. (If I ever get 10000 orders in one day I'll have a problem - but that would be a good problem to have!)

This approach guarantees uniqueness and takes any processing load away from the insert transaction and into the batch routine, where it does not affect the user.

What's wrong with allowing the user to see the primary key?

You could generate the numbers randomly, make sure it's a really big number so that clashes are unlikely, then just run a select to check it doesn't exist.

Or, you could pick a huge number, and then base some equation around that. Something like:

unique = 1000000000 * (-1 * PK)^3

That means that the unique numbers will get further away from your starting number as the PK increases, and be above or below it depending on whether the PK is odd or even. The more complexity you add to the equation, the less likely it'll be discovered, but never ever 100% rely on this method, as there is always the possibility someone will work it out.

What I do is use part of a GUID and the actual ID.

In the table I have a column type uniqueidentifier with a default value of newid()

I then take part of it and add the actual serial ID on the end with a known delimiter between them. I use the letter H as this doesn't appear in GUIDs.

So for row #8659 I would have:
IDcolumn=8659
GUIDcolumn='{200BAB55-C7D5-4456-AB57-CFF8B7E82A90}'
PROFILECODE='200BAB55H8659'

I can locate the correct row by:

partGUID=split(PROFILECODE,'H')(0) - gives 200BAB55
realID=split(PROFILECODE,'H')(1) - give 8659
select * from mytable where IDcolumn=8659 and left(GUIDcolumn,8)='200BAB55';

In theory the SQL parser should find all rows with IDcolumn 8659 first and then check for the GUIDcolumn

If people try to guess an ID for a profile they couldn't just change one part of it and succeed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top