Question

How can the Google short URL cater for so many URLs in web with just four character, even though alphabets are case sensitive?

http://goo.gl/SUWp

Say fn(some url)-> four letter for url, how can they suddenly use the same function which gives five letter for url after sometime? How will they know whether it is in four letter or five letter url from url?

Was it helpful?

Solution

26 letters * 2 (upper/lowercase) = 52 ^ 4 (to the power of 4) = 7311616 urls

if they add digits, it would be 62^4 = 14776336 urls.

so they have some time to go before adding a 5th letter/digit

OTHER TIPS

It works the same way as all the other shorteners- the characters are a unique ID of the URL that has been shortened. With 52 letters (upper and lower case) plus numbers and special characters, there are plenty of combinations to work with.

This is how:

Google URL Shortener is currently available for Google products and not for broader consumer use.

I don't believe even Google has 7 million pages worth shortening.

Edit:

Apparently you can shorten URLs using the Google Toolbar:

Google URL shortener is not a stand-alone service; you can't use it to shorten links directly. Currently, Google URL Shortener is only available from the Google Toolbar and FeedBurner. If the service proves useful, we may eventually make it available for a wider audience in the future.

Still, that is not "broad" consumer use. If they run out, they'll add more letters.

Response to updated question:

Say fn(some url)-> four letter for url, how can they suddenly use the same function which gives five letter for url after sometime?

Google is not simply hashing the URL then just using it (remember, hashes are only 1 way so you couldn't get the original URL out of them anyway - it must be stored in a database). They may start with a hash, then perform a lookup in a database to see if that key already exists. If it does not, it will be used as the key. If it already exists, they'll use some other method to perform the hash, or manipulate the hash in a way that makes it unique.

How will they know whether it is in four letter or five letter url from url?

If the end of the URL has 4 letters, then that's how they know...

There are 26 letters in English Alphabet. Lower + Upper it is 52.

52 * 52 * 52 * 52 = 7.311.616 They are limited with this number. If they run out of 4 letter urls, they can upgrade to 5 without any problem, Cannot they?

I don't think adding digits is a good idea for that since 0 (zero) and O, 1 (one) and l (L), I (upper case i) and l (lower case L) are very similar.

(26 + 26 + 10) * (26 + 26 + 10) * (26 + 26 + 10) * (26 + 26 + 10) = 14776336

That's 26 lower case, 26 upper case, and 10 digits for 62 possible characters. And actually I think it's likely a base-64 encoded representation of some other value, so the number is probably more like 16777216.

I don't know how Google does it. But I imagine one way you could implement a short URL is by incrementing the value using the characters 0-9a-zA-Z -- essentially using a base 62 number system. So, the method that generates the value might look up the most recently used value, then increment it by one. For example: abcz + 1 = abcA. Or: ZZZZ + 1 = 00000.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top