The problem is that modern CPU architectures will not let you get this answer. They will hide many effects and will expose many very subtle effects.
If you have the values in CPU registers and you have a spare register, then the temp
way is either the fastest way, or the way which consumes the least power.
Using the XOR or the +/- (very neat by the way!) method is for situations where you cannot afford to have an extra location (extra memory variable or extra register). This might seem strange but inside a C preprocessor macro one cannot (easily) declare new variables for example.
When the variables are in memory all variants are very likely to behave the same on any high performance CPU. Even if the compiler does not optimize the code, the CPU will avoid virtually all memory accesses and make them as fast as register accesses.
In total I am inclined to say: Don't worry about the speed of this. It is unimportant to optimize at this level. Try to avoid the swap altogether, this will be the fastest!