Question

If I have N bytes of data, how can I calculate the distinct combinations that are possible? My first thought was that since every byte is really just 8 bits, I should calculate the unique sets of bits possible.

So I have put this together in Java:

private void bytePossibilities(int numberOfBytes) {
    final int BITS_IN_A_BYTE = 8;        
    final int numberOfBits = numberOfBytes * BITS_IN_A_BYTE;        
    long totalPossibilities = 0;
    for (int i = 0; i < (1 << numberOfBits); i++) {
        int[] next = new int[numberOfBits];
        for (int j = 1, a = 1; j < (1 << numberOfBits); j <<= 1, a++) {
            next[numberOfBits - a] = (i & j) == 0 ? 0 : 1;
        }
        System.out.println(Arrays.toString(next));
        totalPossibilities++;
    }
    System.out.println(totalPossibilities + " combinations found for "+ numberOfBytes  +" bytes");
}

Which seems to work...

bytePossibilities(1)
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1]
...252 other combinations...
[1, 1, 1, 1, 1, 1, 1, 0]
[1, 1, 1, 1, 1, 1, 1, 1]
256 combinations found for 1 bytes


bytePossibilities(2)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
...65532 other combinations...
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
65536 combinations found for 2 bytes

So it seems that the distinct possibilities given N bytes of data is 2^(bytes*8). I just want to make sure I'm doing this correctly. It seems extreme that the odds that any 2 random bytes of data (next to each other) would be exactly the same is 1 in 65,536 and for any 3 bytes, 1 in 16,777,216.

If anyone could shed more mathematical light on this, I'd appreciate it.

Was it helpful?

Solution

You were right except for your last statement

The probability of two random bytes being the same is 1 in 256 (the first byte can be anything, but the second byte must be the same as the first). Another way of thinking about it - there are 256 pairs of identical bytes and 256*(1/65536) = 1/256.

You can extend the same to three bytes... When you look at any sequence of three random bytes and want to know the probability that they are all the same, you are really asking "what is the probability of the second and the third each being the same as the first"? And the answer is (1/256)*(1/256).

In reality bytes do not occur "randomly" - in uncompressed data in particular there are significant patterns so these probabilities do not translate into equivalent observed frequencies of pairs and triples for most file types.

OTHER TIPS

You are correct. Every bit of data can have one of two values (0 or 1). So if you have one bit, there are two possibilities. If you have two bits, you have 2 for the first bit and two for the second, so 2*2=2^2. Continuing this idea, you have 2^8 possibilities for a byte and 2^(k*8) for k bytes.

This is just using the product rule, which states that if you have a ways to do one task and b ways of doing another thing, and the two tasks are independent, then if you do both tasks, you have a*b ways of completing both tasks. Hopefully that makes some sense.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top