You are correct: placing the CRC at the end of a frame reduces packet latency and reduces hardware buffering requirements. On the transmit side, hardware can read and transmit bytes of the frame immediately. The transmitter calculates the CRC on the fly as data passes through, then simply appends the CRC the tail of the frame.
Consider the alternative where the CRC comes somewhere in the Ethernet header. Hardware must read and store the entire frame in order to calculate the CRC. This amounts to a large look-ahead operation and adds significantly to transmit latency and hardware cost. The situation also becomes more complex for the receiver as well.