Unexpected unpack results with bit strings

https://stackoverflow.com/questions/16182371

ruby
unpack

11-04-2022
|

Question

Why is it that when I open up irb and I run
puts 'A'.unpack("B8")
I get 01000001 but when I run
puts 'A'.unpack("B4B4")
I only get 0100 and not [0100,0001]?

Is the resolution of unpack only a full byte? Nothing less?

Solution

Let's do some tests to understand the behavior:

> 'A'.unpack('B8')
 => ["01000001"]

It returns the 8 Most Significant Bits (MSBs) of char 'A'

> 'A'.unpack('B4')
 => ["0100"]

It returns the 4 MSBs of char 'A'

> 'A'.unpack('B16')
 => ["01000001"]

It returns the 16 MSBs of char 'A', but as there is only 8 we get the 8 MSBs

> 'AB'.unpack('B16')
 => ["0100000101000010"]

It returns the 16 MSBs of the sequence of chars 'AB' (the end 8 Bits 01000010 corresponds to 'B')

> 'AB'.unpack('B10')
 => ["0100000101"]

It returns the 10 MSBs of the sequence of chars 'AB', i.e. the 8 MSBs of 'A' and the 2 MSBs of 'B'

> 'ABC'.unpack('B*')
 => ["010000010100001001000011"]

It returns all the MSBs of the sequence of chars 'ABC', (the end 8 Bits 01000011 corresponds to 'C')

> 'AB'.unpack('B8B8')
 => ["01000001", "01000010"]

It returns the following array:

the first element is the 8 MSBs of the char 'A'
the second element is the 8 MSBs of the char 'B'

_

> 'AB'.unpack('B8B7')
 => ["01000001", "0100001"]

It returns the following array:

the first element is the 8 MSBs of the char 'A'
the second element is the 7 MSBs of the char 'B'

_

> 'AB'.unpack('B4B8')
 => ["0100", "01000010"]

It returns the following array:

the first element is the 4 MSBs of the char 'A'
the second element is the 8 MSBs of the char 'B'

_

> 'AB'.unpack('B16B8')
 => ["0100000101000010", ""]

It returns the following array:

the first element is the 16 MSBs of the sequence of chars 'AB'
the second element is empty as the chars have already been consumed

_

> 'AB'.unpack('B*B8')
 => ["0100000101000010", ""]

It gives you the same result, and consume all the string.

> 'AB'.unpack('B9B8')
 => ["010000010", ""]

It returns the following array:

the first element is the 9 MSBs of the sequence of chars 'AB'
the second element is empty as the chars have already been consumed

As conclusion,

the directive BN over a String will consume at most the first ((N-1) / 8) + 1 chars of the String. If there is still chars in the string, and you have a second directive BM, you'll consume at most the next ((M-1) / 8) + 1 chars of the String. And so on for all the next directives. If you use the directive B*, it will consume all chars, and returns the sequence of their corresponding MSBs.

For instance:

'ABCDEFG'.unpack('B17B*B8')

It should returns us:

the 17 MSBs of the sequence ABC
all the MSBs of the sequence DEFG
an empty bitstring

Let's check:

> 'ABCDEFG'.unpack('B17B*B8')
 => ["01000001010000100", "01000100010001010100011001000111", ""]

And indeed 'A'.unpack('B4B4') returns the array ["0100", ""] as the first directive consumes the char A.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow