Why is the Frame Check Sequence at the end of an Ethernet frame and not somewhere else

Question 1

You are correct: placing the CRC at the end of a frame reduces packet latency and reduces hardware buffering requirements. On the transmit side, hardware can read and transmit bytes of the frame immediately. The transmitter calculates the CRC on the fly as data passes through, then simply appends the CRC the tail of the frame.

Consider the alternative where the CRC comes somewhere in the Ethernet header. Hardware must read and store the entire frame in order to calculate the CRC. This amounts to a large look-ahead operation and adds significantly to transmit latency and hardware cost. The situation also becomes more complex for the receiver as well.

Question 2

In fact the placement has very much to do with what CRC is about: polynomial division. If you move the CRC remainder to the front of the payload bitstream, you will invalidate some of the CRC properties, such as burst error detection.

The key to understanding this is the fact, that a CRC is always operating on a stream of bits, not bytes, or a block of payload. Sometimes you can find fault CRC implementations, where bits are transmitted little-endian, but actually the CRC is calculated big-endian (in term of bit ordering within individual bytes).