Question

i want to uniquely shorten strings-file ids to use in urls like the ones on bit.ly etc. I can use ids from a db but i want urls to be random like.

what would be the best solution?

site will be a mobile site so i want to it to as short as possible

Was it helpful?

Solution

You can't "uniquely shorten" arbitrary strings. Pigeonhole principle and all.

What you want to do (and, AFAIK what url-shortening services do) is keep a database of everything submitted, and the short string used. Then you can look it up in the database.

You can generate the short strings by simply incrementing a number and Base64 encoding it for each time.

OTHER TIPS

There are two methods to implementing a mapping service like the one you describe.

  1. Clients submit globally unique ids, or
  2. Server generates globally unique ids

Clients submit globally unique ids

As far as I know, 1. should only be attempted with Guids, unless you devise a similar means to cram sufficiently distinct information into a short byte stream. Either way, if you have a stream of bytes that represent a globally unique identifier, you may do something like this

// source is either a Guid, or some other globally unique byte stream
byte[] bytes = Guid.NewGuid ().ToByteArray ();
string base64String = Convert.ToBase64String (bytes).Trim ("=");

to obtain a user-readable string of alphanumerics that appears random, but avoids collisions inherent in other random schemes. A Guid contains 16 bytes, or 128 bits, which translates to approximately 19 characters for a full Base64 encoding.

The advantage to this approach is that clients may generate their own tiny Uris without a central authority. The downside is hefty length if you roll with Guid, or implementing your own globally unique byte stream which - let's face it - is error prone.

If you do go this route, consider Google'ing globally unique byte streams or the such. Oh, and STAY AWAY FROM RANDOM BYTES, otherwise you will have to build collision resolution ON TOP OF your tiny Uri generator.

Server generates globally unique ids

Again, the primary advantage to the above is that Client's may generate their Uris a priori. Particularly handy if you are about to submit a long running request you wish to check up on. This may not be particularly relevant to your situation, and may provide only limited value.

So, that aside, a server-centric approach, in which a single authority generates and doles out ids may be more appealing. If this is the route you choose, then the only question is how long would you like your Uri?

Presuming a desired length of 5 characters, and let's say you go with a Base64 encoding, each id may represent up to 5 characters by 7 bits per character equals 35 bits or 2^35 [34 359 738 368] distinct values. That's a fairly large domain. *

Then it becomes a question of returning a value for a given submission. There are probably a great many many ways to do this, but I would go with something like this,

  • Enumerate all possible values within a "free list" in your database
  • Remove value from free list when consumed
  • Add value to free list when released

Enhancements or optimizations may include

  • Do not enumerate every value on range [0, 2^35], instead enumerate a manageable subset, say 100 000 values at a time, and when all values are consumed, simply generate another 100 000 values in sequence and continue
  • Add an expiry date to values, and recycle expired values end of day
  • Distribute your service, when parallelizing your service simply dole out small mutually exclusive subsets of your free list to distributed services

Conclusion

Bottom line is, you want to guarantee uniqueness - so collisions are a big no-no.


*=34 359 738 368 is the size of the raw domain, this is all ids of 0 length to 5 length. If you are interested in restricting all ids to a minimum and maximum of 5 length, then your domain looks like all ids of length 0 to 5 (2^35) less all ids of length 0 to 4 (2^28) is 2^35 - 2^28 = 34 091 302 912, which is still quite large :)

store a random alpha-numeric string and use that for your short url. make it the length that you think is best for your site and it's users some thing like www.yoursite.com/d8f3

You could use a hash (for example CRC32) to produce quite short URLs. You will never be able to get 'unique' URLs as you are reducing the data, so there has to be collisions.

Hey nll, as several other people has told you.. If you start compressing the url into something small it will be impossible for you to keep it unique. That said, you need to make your own coding for every url submitted to you. One way (easy) to do it is, try to create a database from the submitted urls and then generate a guid field for each and then get a substring from it ensuring everytime you register something is totally different from the previous.

For instance: www.google.com with the guid F9168C5E-CEB2-4faa-B6BF-329BF39FA1E4 -> http://www.mysite.com/?q=CEB2

As more characters as you use, more amount of links you can keep track on. for this sample you will have 65536 different links (with only 4 characters on hex).

Hope this helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top