Why does the shortcut used by BitConverter when the start index is divisible by the size of the type being converted to work?

Question 1

The code avoids a hardware exception on processors that don't allow misaligned data access, a bus error. Which is very expensive, it is usually resolved by kernel code that splits up the bus accesses and glues the bytes together. Such processors were still pretty common around the time that this code was written, the tail-end of the popularity of RISC designs like MIPS. Older ARM cores and Itanium are other examples, .NET versions have been released for all of them.

It makes little difference on processors that don't have a problem with it, like Intel/AMD cores. Memory is slow.

The code uses IsLittleEndian simply because it is indexing the individual bytes. Which of course makes the byte order matter.

Question 2

On most architecutre there is a performance hit in accessing data that isn't aligned at the proper boundary. On x86 the CPU will allow you to read from an unaligned address, but there will be a performance hit. On some architecture you'll get a CPU fault that the operating system will trap.

I'd guess that the cost of letting the CPU fix up the reading of unaligned data is greater than the cost of reading the individual bytes and doing the shift/or operations. Also, the code is now portable to platforms where an unaligned read will cause a fault.

Question 3

Why does this work regardless of the endianness of the machine?

The method does a re-interpretation of bytes assuming that they were produced in the environment with the same endianness. In other words, endianness influences both the the order the input bytes are arranged in the array, and the order the bytes need to be arranged in the output short in the same way.

Why doesn't it use the same mechanism when the machine is Big Endian?

This is an excellent observation, and it is not immediately obvious why the authors didn't do the cast. I think the reason behind it is that if you cast a pbyte with an odd value to short*, the subsequent access to short would be unaligned. This requires a special opcode to prevent a hard exception, which some platforms generate on unaligned access.