It doesn't seem like anybody is going to provide a concrete and valuable answer. So let me try to do that.
First of all, even a bit primitive and straightforward approach allows to spot the fact you do not need to process all the data coming from sensors. Moreover humans are not that fast, so there is no need to proceed 10000 values per second in order to identify any specific move as well.
What you actually need is just to identify key points and make your decision. Does it sound like a tanget to you?
What I'm actually suggesting is to test your solution using an ordinary mouse and available gesture recognition framework. Because the actual idea is pretty much the same. So please check:
It such a way it might be easier to develop a proper solution.
Update
Let's imagine I'm holding my phone and I need to rotate it 90 degrees counterclockwise and then 180 degrees clockwise. I hope you will not expect me to do some complex 3D shapes in the air (it will break usability and frankly I do not want to loose my phone), so it is possible to say there might be a point we can track or we can easily simulate it.
Please see my other answer in order to see simple, but working solution of a similar problem: