The -n 2 option that you're referring to for mahout seq2sparse
is actually the specifying the L_p norm for to use for length normalization[1]. So mahout seq2sparse ... -n 2
uses L_2 length normalization of the TF-IDF vectors. Alternatively you could use the -lnorm
for log-normalization. This is part of the preprocessing step before used for both Complement and Standard Naive Bayes[2].
Weight normalization is different from length normalization and is not used in Mahout 0.7.
Weight normalization is used in the upcoming 1.0 release so to get the best comparison of Standard and Complement Naive Bayes you should checkout and build a copy of the latest trunk: http://mahout.apache.org/developers/buildingmahout.html.
You should see a significant difference between Standard and Complement Naive Bayes if you upgrade to the latest trunk.
[1] mahout.apache.org/users/basics/creating-vectors-from-text.html
[2] http://mahout.apache.org/users/classification/bayesian.html