Well, I've fixed it. Generally the last TDO of the prologue is the first least significant bit of the output. For SendCommand it has no meaning, but for XferData and XferFastData it is important.
For XferFastData it is the PrAacc bit according to the spec. If the bit is zero, you should repeat the whole operation. But beware: the MCU implementation doesn't follow the spec. If you really restart the whole operation for FastData if PrAcc is zero, it won't work. Instead just ignore the bit and proceed writing. I've found it out eventually by trial and error and by comparing my XferFastData implementation against pic32prog.