This remark of Charles Moore can easily be misunderstood, because it is in the context of his Forth processors. A SWAP is not a machine instruction for a hardware Forth processor. In general in Forth some definitions are in terms of other definitions, but this ends with certain so called primitives. In a Forth processor those are implemented in hardware, but in all Forth implementations on e.g. host systems or single board computers they are implemented by a sequence of machine instruction, e.g. for Intel :
CODE SWAP pop, ax pop, bx push, ax push, bx END-CODE
He also uses the term "convenient", because SWAP is often avoidable. It is a situation where you need to handle two data items, but they are not in the order you want them. SWAP means a mental burden because you must imagine the stack content changed. One can often keep the stack straight by using an auxiliary stack to temporarily hold an item you don't need right now. Or if you need an item twice OVER is preferable. Or a word can be defined differently, with its parameters in a different order.
Going out of your way to implement SWAP in terms of 4 FORTH words instead of 4 machine instructions is clearly counterproductive, because those FORTH words each must have been implemented by a couple of machine instructions themselves.