The assignment is badly worded and misleading.
Unicode defines codepoint values that can take up to 20 bits (U+0000 to U+10FFFF). All of the UTF encodings (UTF-8, UTF-16, and UTF-32) support all 20 bits, just in different ways.
UTF-8 and UTF-16 are variable-length encodings. The number of bytes needed to encode a given codepoint depends on the actual codepoint value. UTF-8 uses 1, 2, 3, or 4 8-bit codeunits. UTF-16 uses either 1 or 2 16-bit codeunits.
UTF-32 is a fixed-length encoding. It always uses 1 32-bit codeunit, since most systems do not have a 20-bit data type.
Implementing UTF conversions is very easy (they are designed to be interchangable), but you first need to know which encoding the source file is actually using. If the file starts with a UTF-16 BOM , that is very easy to detect. However, if no BOM is present, then you need to either ask the user for the encoding, or else use heuristic analysis of the data to try to detect the encoding dynamically.
Once you know the encoding, the rest is straight forward:
If UTF-16, read the file in 16-bit chunks (1 codeunit at a time), combining adjacent UTF-16 surrogate codeunits as needed (very easy to detect). For each completed sequence, extract the encoded 16/20 bits and output them in a single UTF-32 codeunit.
If UTF-32, read the file in 32-bit chunks (1 codeunit at a time), extract the 20 bits, and out them as either 1 or 2 UTF-16 codeunits as needed.
The most difficult part of the assignment is determining the encoding of the source file.