correct coding of ID3 v2.3 frame size field for GEOB tag

Question 1

I believe I have found the answer. ID3 v2.3, despite its being the more commonly supported (as opposed to v2.4) has not to well-written (and informal) spec. Its header size uses the 4 0x7F bytes, but the frame sizes are in fact 32-bit integers, only they are never clearly spelled out.

the reason I usually encountered the problem when dealing with GEOB is because the problem won't crop up until the frame size is larger than 0x7F, and GEOB usually is.

Question 2

Yes. However, I consider the docs to be explicit enough, given the conventions of % (binary) and $ (hexadecimal) which are explained right away:

Header size:
- 4 * %0xxxxxxx as per v2.2.0 (§3.1.) header
- 4 * %0xxxxxxx as per v2.3.0 header
- 4 * %0xxxxxxx as per v2.4.0 (§3.1.) header
Frame size:
- $xx xx xx as per v2.2.0 (i.e. §4.1.) frame
- $xx xx xx xx as per v2.3.0 frame
- 4 * %0xxxxxxx as per v2.4.0 (§4.) frame

Summary:

For all 3 versions in ID3v2 the header size is stored in the same way: using 4 bytes, but for each only 7 bits are valid.
Only for ID3v2.2 frames the size consists of 3 (full) bytes.
Only for ID3v2.3 frames the size consists of 4 (full) bytes.
Only for ID3v2.4 frames the size finally is stored just like the header's size: 4 bytes, but only 28 bits are valid.

ID3v2.4.0 changes §3 also lines out the frame size change from v2.3.0. The whole issue comes from MPEG Audio (and AAC) stream which synchronizes with 9 (or 12) bits set - any decoder might then misinterpret the ID3 metadata as audio data.