In keeping with the nature of the Nand2Tetris course, I've tried to walk a line in this answer, giving examples of Hack assembly coding techniques and general algorithms, but leaving the final code as an exercise.
The Hack ALU does not have any data paths that connect bit N with bit N-1. This means that right-shifts and rotates must be implemented using left rotates. (Note: left = most significant bits, right = least significant bits)
A left-shift is easy, since it's just multiplication by 2, which is itself just self-addition. For example:
// left-shift variable someVar 1 bit
@someVar // A = address of someVar
D = M // D = Memory[A]
M = M + D // Memory[A] = Memory[A] * 2
Left-rotate is a bit more difficult. You need to keep a copy of the leftmost bit, and move it into the rightmost bit after doing the multiply. Note however that you have a copy of the original value of "someVar" in the D register, and you can test and jump based on its value -- if the leftmost bit of D is 1, then D will be less than zero. Furthermore, note that after you multiply "someVar" by 2, it's rightmost bit will always be 0, which makes it easy to set without changing any of the other bits.
Once you have left-rotate, right-rotate is straightforward; if you want to left-rotate N bits, you instead right-rotate 16-N bits. Note that this assumes N in range 0-15.
Right-shift is the most complicated operation. In this instance, you need to first do the right-rotate, then generate a mask that has the upper N bits set to zero. You AND the result of the right-rotate with the mask.
The basic way to generate the mask is to start with -1 (all bits set) and add it to itself N times; this makes the rightmost N bits of the mask 0. Then left-rotate this 16-N times to move all the 0 bits to the leftmost N bits.
However, this is a lot of cycles, and when programming in assembly language, saving cycles is what it's all about. There are a couple of techniques you can use.
The first is using address arithmetic to implement the equivalent of a case statement. For each of the 16 possible rotate values, you need to load a 16 bit mask value into the D register, then jump to the end of the case. You have to be careful because you can only load 15 bit constants using the @instruction, but you can do the load and unconditional jump in 6 instructions (4 to load the full 16 bit constant, and 2 to jump).
So if you have 16 of these starting at location (CASE), you just need to multiply N by 6, add it to @CASE, and jump to that location. When thinking about how to multiply by 6, keep in mind one of the really cute features of the HACK instruction set; you can store the results of an ALU operation in multiple registers simultaneously.
The most efficient solution, however, is to precompute a mask table. During your program initialization, you generate the 16 bit masks and store them in some fixed location in memory, then you can just add N to the address of the start of the table and read the mask.
Since the HACK CPU can't access the program ROM other than to fetch instructions, you can't store the table in ROM, you have to use several instructions per table entry to load the value into the D register and then save it into RAM. I ended up written a simple python script that generates the code to initialize tables.