Question

I have an SQL database of unique codes (consiting of numbers/letters).

I want to generate large amount of new codes for this database, that also need to take into account the previous codes that were not generated with any kind of seed or similar.

So, how would I do this the best way? The way it is done currently is that for each code that is generated it loops through all the codes in the database and makes sure the levenshtein distance is not to small. But this takes forever now that the database has grown large.

Any help appreciated. This is used in a ASP.NET C# application.

Was it helpful?

Solution

If the total number of issued codes, the length of your codes and acceptable Levenshtein distance are small enough, you could build a tree of existing codes and their 'Levenshtein neighbors' in memory to reduce the time to generate a new code to O(ln N). If there are too many codes, you could try creating an additional SQL table containing just the codes and neighbors and rely on SQL for O(ln N) search. When you insert a new code, insert its neighbors together with it.

If you have the flexibility, i.e. you can increase code length by 1 or add a new character to the acceptable character set, or if there is an unused character in some position, the best solution would be to separate the 'old' and 'new' code spaces and generate new ones algorithmically to meet your requirements. This approach was adopted for UUID/GUID when they decided not to use the computer's MAC address in it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top