The standard procedure for what you're describing would be very similar to method 1:
- Train two instances of the same classifier on feature set
A
andB
respectively. - Evaluate each using some form of cross validation, say 10 fold cross validation, or leave one out like you have been using.
That said if you're not strictly restricted to feature set A xor B
then you may achieve better results deriving a new set C
using a similar method to what you described in 2.
It is difficult to trust the results using leave one out cross validation, it would probably be better to use 10-fold. This may be one of those situations where more data would help greatly if you can get it and if not you may not be able to perform your analysis.