If you had to represent a large number concisely would you use base 36 or ZZ?

https://softwareengineering.stackexchange.com/questions/400854

04-03-2021
|

Question

According to Wikipedia:

In mathematics and computing, hexadecimal (also base 16, or hex) is a positional system that represents numbers using a base of 16. Unlike the common way of representing numbers with ten symbols, it uses sixteen distinct symbols, most often the symbols "0"–"9" to represent values zero to nine, and "A"–"F" (or alternatively "a"–"f") to represent values ten to fifteen.

Using two of those symbols you can get 256 numbers out of it. The largest number would be represented as FF.

If I wanted to represent a much larger number could I use base 36 or ZZ? If this is possible why hasn't this been used before?

More info:
I'm in a scenario where I'm indexing items with unique IDs and I'd like to be able to represent around 1000-10000 items on average and if I can I'd like to use two symbols.

Also, where do web colors fit into this?

White is represented as #FFFFFF. That being full red (FF) full blue (FF) full green (FF). Is that easier to read than using a different base encoding?

Solution

The goal of hexadecimal encoding is not to encode larger number in fewer digits, but to have an easy mapping between digits and bits in a byte: 2 hex digits correspond to a byte. And 1 hex digit is 4 bits so half a byte (I use byte in the sense of an octet).

You can of course use a base 36 digit(0..9 A..Z) to encode larger numbers in fewer digits. With two such digits, you can then encode 1296 position (36 to power 2 digits). With 3 digits you can encode 46656 values.

You could even decide to have case sensitive digits and encode a digits in base 62. With two such digits you can encode 3844 values. With three digits it’s 238 328 values.

Using every printable ascii chars you can go to base 92. For 10000 entries, you’d still need 3 digits.

But if you would use normal decimal number but encoded as binary, you’d just need 2 bytes.

OTHER TIPS

Your main misunderstanding is about the purpose of BaseX formatting.

As others pointed out BaseX (with X being a high number) does not increase information density. You cannot get higher density on an inherently binary machine by changing the representation of bits. If you have a lot of bits you can apply compression though which will save you some space depending on the sort of data at hand, at the cost of processing speed.

The most commonly used BaseX format is Base64 and its purpose is to transfer binary data as printable characters, specifically in an email message. This is not the most dense way to encode data but it will fit nicely in the body part of an email message without disturbing the mail processor with accidental control codes because everything is just text.

It isn't done on today's computers because merely working in a different (apparent) number base doesn't increase the information density or processing performance.

First, computers don't use decimal or hex — those are formats fit for human consumption, i.e. printing, input/output. Internally, and physically, computers use binary digits to store, represent, and manipulate values, whether they communicate them in decimal or hex, fixed or floating point. To store a bigger number, we use more bits!

So, printing numbers out in base 36 does not change their internal representation as binary digits using logic in some process technology of the day.

Some NAND flash drives use MLC, which allows encoding 2 bits in one logical cell. And this does improve the physical density allowing more storage in the same volume. However, there is a trade off in that these can be slower and suffer in longevity as well.

In some sense, the answer has to do with information theory, which is the study of encodings in physical systems.

There were vacuum tube systems in the very old days that used base 10 directly — each tube could store 10 different values, corresponding to decimal digits.

However, several forms of simplification have helped drive miniaturization, two of these are: (1) using binary instead of multi-valued logic devices, and, (2) using a single logic gate (like a NAND gate) instead of trying to combine AND, OR, and NOT logic gates in a single technology process.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange