Disclaimer: I am the author of some of the tutorials/software I am going to list.
Your steps seems fine. They are very similar, for example, to the method used by Lee and Kim (minus the threshold models part).
Perhaps you can follow this guide on HMMs in C#. It was written by me, and have some examples on how to perform the gesture recognition. However, the guide creates continuous-density HMMs rather than discrete HMMs. Your vector quantization makes your features discrete, so I guess you would be more interested in using discrete-density HMMs. The only changes you have to do to make it work with discrete models is to remove the generic arguments in the model creation/learning.
To perform live gestures, you can either use some kind of specific marker to signal the starting and ending of a gesture, or you can use a technique similar to Lee and Kim's threshold models. The framework described in the aforementioned guide has support for those.
I have worked on a similar project using the same technology and I can say HMMs work just fine for the task.