What is fastest way to copy a variable into another?

Question 1

The problem is that modern CPU architectures will not let you get this answer. They will hide many effects and will expose many very subtle effects.

If you have the values in CPU registers and you have a spare register, then the temp way is either the fastest way, or the way which consumes the least power.

Using the XOR or the +/- (very neat by the way!) method is for situations where you cannot afford to have an extra location (extra memory variable or extra register). This might seem strange but inside a C preprocessor macro one cannot (easily) declare new variables for example.

When the variables are in memory all variants are very likely to behave the same on any high performance CPU. Even if the compiler does not optimize the code, the CPU will avoid virtually all memory accesses and make them as fast as register accesses.

In total I am inclined to say: Don't worry about the speed of this. It is unimportant to optimize at this level. Try to avoid the swap altogether, this will be the fastest!

Question 2

http://en.wikipedia.org/wiki/XOR_swap_algorithm

Most modern compilers can optimize away the temporary variable in the naive swap, in which case the naive swap uses the same amount of memory and the same number of registers as the XOR swap and is at least as fast, and often faster. The XOR swap is also much less readable and completely opaque to anyone unfamiliar with the technique. On modern CPU architectures, the XOR technique is considerably slower than using a temporary variable to do swapping. One reason is that modern CPUs strive to execute instructions in parallel via instruction pipelines. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order.

Also see this question:

How fast is std::swap for integer types?

It's important to note that the XOR swap requires that you first check that the two variables do not reference the same memory location. If they did, you would end up setting it to zero.

Question 3

XOR swap isn't always the most efficient, since most modern CPU architectures try and parallelize instructions but in the XOR swap, each line is dependent on the previous result (not parallelizable). For the temp variable swap, most compilers will optimize the temporary variable out which end up with the naive way running as fast or faster as well as using same amount of memory.

Another swap alternative is:

x = x + y
y = x - y
x = x - y

similarly, the arguments for efficiency and speed for the XOR swap apply here too.

EDIT: as hatchet said, the (+/-) approach also can cause overflow if not done carefully