How does char data type represent in 32-bit registers?

Question 1

If the CPU allows segmented register access (eg to the word, byte) it may just use the byte sub register. If the CPU is strictly 32 bit your byte goes into bits 0-7. Depending on the destination usage it may or may not mask out the rest of it to 0. (AND reg,0x000000FF) if the destination code works with the register as wholes. There are too many variables and much open-endedness to give you a black an white answer.

utilizing 0xFF as a byte register and 0x000000FF as a d-word register are identical to the opcodes that would use them if they had separate byte and dword couterparts. Unless they are bit-specific ops like "branch if high bit set", or bit rotation/shifting. If signed, 0xFF would expand to 0xFFFFFFFF (or 0x83 to 0xFFFFFF83)

Edit to the update: C representing a char in a register would indeed zero the rest out, depending on the compiler it may ZERO the register first before setting bits 0-7 or it may perform as explained above. When signed, the sign bit needs to extend so 0 the register, NEG it and set 0-7. Some CPU even have an op explicitly for sign-expanding.

Question 2

In general, unsigned quantities are padded with zeros, and signed quantities are sign-extended.

The char type in C is a special case because the standard allows it to be either signed or unsigned (and some compilers provide an option to let the developer choose). This allows the compiler to use whichever is most efficient.

Question 3

It depends on how it is put in there. Data can never be "too small" for a container.

Question 4

Chars are subject to integer promotion. They are sign-extended to an int as soon as they are combined with other non-char integer values, or even floating point if they are combined with such operands.

It is up tp you to make sure you don't use the result improperly. When you cast back an int to a char, you implicitely accept the risk of losing the upper significant bits.

As for how a particular compiler handles that, it's up to the compiler designer. In Pentium monstruous architectures, you can use the char version of the register, but on more conventional processors it might be more convenient to sign-extend the char to normalize its value if it is involved in further computations.

Question 5

The C language doesn't have registers, so there is no such representation that is visible to the programmer. If a portion of a wider register is used, the unused portion could have other data in it or it could have zeros. What is important is that a correct program that manipulates char values (or any other) is translated correctly so that it produces the correct output and any other externally visible behavior.

If 32 bit registers are used to hold 8 bit chars, and the unused bits are not cleared, then the generated machine code has to take care, for instance, not to involve the remaining 24 bits in a comparison like one resulting from (char_a == char_b), because then two equal chars would incorrectly compare unequal. The generated machine code has to tell the processor to use some byte-wide operation that only looks at the least significant 8 bits. Some architectures have this kind of thing, and so it's probably easier to generate code which converts char representations in memory into full 32 bit values in registers (sign-extended, if they are signed).

It really depends on what is convenient and efficient on the given target processor.

Question 6

On x86 there are individual hardware names for sub-registers. Lower part of eax is al. You can even allocate 2 chars on the same register: eax is [16 bits | ah | al]. So it is possible to handle chars via al/ah/bl/bh and so on having garbage in high bits. But gcc prefers to perform really weird things:

char foo(char c) {
    return c+(char)1;
}

gcc -O2 -m32 -S:

foo:
    pushl   %ebp
    movl    %esp, %ebp
    movzbl  8(%ebp), %eax
    popl    %ebp
    addl    $1, %eax
    movsbl  %al,%eax
    ret

movzbl means extend with zeros, movsbl means extend with sign bits.

First time it extends input with 0, then performs +1, then extends result (eax) with low byte (al) sign bits. So it uses both zero/sign bit extension. Zero leaves value the same for unsigned char, sign bits for signed char.