Pergunta

This is a hard enough idea to wrap my head around and I would greatly appreciate any edits/help to get it more readable for those in-the-know.

Is it theoretically possible to have a hard drive that has saved on it one copy of every possible binary permutation of one kilobyte and then have the rest of the system simply create pointers to these locations?

Would a system made such a way be any faster than simply having information stored directly?

To explain another way, say instead of having sentences:

"Hello, I'm Bob." and "That sandwich looks delicious."

...stored on the hard drive, we would have all permutations of the alphabet and other characters up to some number (say, 1000 characters or so), and then have store our sentences as something like:

[Pointer#21381723]

Foi útil?

Solução

There are 28192 possible different 1K blocks. Storing them all would take 28202 bits of storage. Since the universe contains only about 1080 (or ~2266) particles, it's a safe bet that it isn't possible to store them all, and you don't have to wonder about whether it would save time or not.

But there is, in fact a more interesting way of answering this. You are suggesting creating an index into a huge pool of constants. But how would you know which index to dereference? Imagine for the sake of an argument that you want to store only 1-character blocks: a, b, c... Presumably your indices would be 0, 1, 2 etc., since that's the most efficient layout of storing those blocks.

Do you notice something about the arrangement? Your index is, in fact, a coded representation of the stored data! In other words, you don't have to dereference at all, you just have to transform the index into the data you want.

When you store all possible values of something in a table, this always happens: your index becomes merely an encoded version of the data itself, so storing the data becomes unnecessary in the first place. This why in the real world, indices are only useful for sparse data (e.g. all web pages you've visited, not all web pages that could exist, or even all that do exist).

Outras dicas

As others have already pointed out, you have 2^8192 possibilities for a 1k block. This means you would need 8192 bits to encode the address of a block if all blocks addresses are encoded with the same amount of bits, so your addresses would be 1k long. You wouldn't have gained anything except adding a layer of indirection so you wouldn't gain any performance.

If you wanted to have shorter addresses, you would have to encode some blocks with a short address and some with longer ones and make it so that long ones don't appear that often, and you are now simply compressing data (probably with something like a Huffman code). That would require knowledge of the data you're storing before storing it or regular changes in the encoding. It would also probably be less efficient than other compression algorithms which use blocks of varying length.

There are two problems with that.

First, "all possible binary permutations of one kilobyte" is a huge amount of data. 1024 bytes * 8 bits per byte = 8192 bits in a kilobyte. All possible permutations would be 2^8192. That's around 1.09e+2466 kilobytes! (For purposes of comparison, a 1 TB drive is 1e09 kilobytes.)

Second, even if you did have such an enormous table, and you indexed into it with pointers, what would you do if you wanted to reference some data smaller than exactly 1 KB?

As other posters have pointed out, at some point, the size of the pointer needed to index into your list of all possible values nullifies your gain.

However, some languages use a limited version of what you suggest in order to optimize memory usage. Python uses string 'interning' to lessen the number of duplicate strings in memory. You can find more information by searching for 'python string intern'.

Licenciado em: CC-BY-SA com atribuição
scroll top