On x86 and x86_64 loads have acquire semantics and stores have release semantics anyway, even without using atomics, so all the memory orders except seq_cst
require no special instructions at all.
To get full sequential consistency the compiler can insert an mfence
instruction to prevent reordering of operations on distinct memory locations, but I don't think any other special instructions are needed.
Compilers need to avoid moving loads and stores across atomic operations, but that's purely a limitation on the compiler optimisers and requires no CPU instructions to be issued.
See http://www.stdthread.co.uk/forum/index.php?topic=72.0 for some good information.