For sequence tagging task like speech recognition you need to use combination of SVM and HMM, not just SVM
- Align feature matrix to states with GMM-HMM, get feature corresponding to each HMM state
- Train SVM on features belonging to each state
- Implement SVM-HMM instead fo GMM-HMM
To learn more read
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.442
To make it fast, use existing toolkits like: