Domanda

I want to create an environment for artificial intelligence, with planetary size. It will simulate underground life on a very large world. According to Wikipedia, planet Earth has a surface area of 510,072,000 Km^2, I want to create a square of similar proportions, maybe bigger. I will store one meter on each bit, where 0 means dirt and 1 means wall of dirt.

Let's first calculate how to store a single line of this square. One line would be 510072000000m and each byte can store 8 meters, so one line would be 59.38GB and the entire world would be 3.44PB. And I would like to add at least water and lava to each square meter, that would multiply the results by 2.

I need to compress this information with lossless data compression algorithms. I first tried a very direct approach with 7zip and I tried it with a smaller world, where one line would be 6375B. In theory, the world should be 6375^2B = 38.76MB, but when I try it I get a file of 155MB, I do not know why this difference. But when I compress it with 7Zip, I get a file of 40.1MB. It is a huge difference, and with that ratio I would convert my 3.44PB world file into a 912.21GB file.

My first thought is, why am I having such a large file, when maths tell me it should be smaller? Maybe the problem is the code, maybe the problem is that I had errors on maths. The code is as follows: (C#)

// 510072000000m each line = 63759000000B
const long SIZE = 6375;

// Create the new, empty data file.
string fileName = tbFile.Text;

FileStream fs = new FileStream(fileName, FileMode.Create);

// Create the writer for data.
BinaryWriter w = new BinaryWriter(fs);

// Use random numbers to fill the data
Random random = new Random();
// Write data to the file.
for (int i = 0; i < SIZE; i++)
{
    for (int j = 0; j < SIZE; j++)
    {
        w.Write(random.Next(0,256));
    }
}

w.Close();

fs.Close();

And the maths are so basic that if I did something wrong I cannot see it.

Can you give me any advice? Just focus on data compression, artificial intelligence is not a problem because I have experience with evolutionary algorithms and the world does not need to be real time, it can take all the time it needs.

Thank you all for your time.

È stato utile?

Soluzione

@Scharron has correctly answered the specifics of your question, but I think there is a more fundamental issue:

It is theoretically impossible to compress random data significantly. Indeed most compression algorithms will increase the storage size when given random input data. Perhaps the specifics of your AI algorithm will introduce some patterns than can be compressed, but if you are starting with truly random input data, you will have to store those multiple PB.

The reason you were seeing significant compression is that, as @Scharron pointed out, you were writing 3 zero bytes for every byte of data, leading to much more easily compressible data.

Altri suggerimenti

I don't know about C#, but it seems you are currently writing 4 bytes each time (6375 * 6375 * 4 Bytes in MB = 155 MB). So I guess the Write method currently writes a 32 bits integer.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top