Efficient to apply Run-Length Transform after Move to Front Transform and BWT?

Question 1

It's not efficient.

It's probably an old paper on BWT. Back then, using RLE (run-length Encoder) was a common trick to improve speed. RLE, btw, was not only used after BWT, but also before, in order to speed up BWT stage. The same logic could apply to the entropy coder, but it is unlikely to provide as much benefit.

Nowadays, this trick are completely out dated. Before BWT, you might find some kind of LZP pre-processing, especially for long range matches (beyond block size), with about the same intend as RLE regarding speed improvement, but more powerful. MTF is also completely replaced, since it is too CPU intensive and not as efficient for its cost.

Question 2

The goal of the MTF after the BWT is to shrink the range of symbols which is good for entropy coding. MTF replaces symbols with their rank which is usually small due to the repetitions generated by the BWT. A Zero Length Coding may be applied after that since a lot of 0s may be produced by the MTF (it is just a specific case of RLE where only runs of 0s are encoded by length).

Take a look at https://github.com/flanglet/kanzi-cpp for implementation examples. You can run the BlockCompressor with the Block codec with or without MTF+ZLE.