Class type
First, I would suggest that you make you change your class
type to numeric
, if you would like your output to be continuous. Otherwise, I'd suggest keeping the class label type as nominal
(as you have it now) but change your ratings to {1, 2, 3, 4, 5}
.
If you change to a numeric output (so you can give a prediction of 4.5 stars, for example), then you'll need to use a classifier that is capable of a numeric class.
Using Weka
I'd suggest taking a look at the documentation to learn more about using Weka, possibly by going through some tutorials. For example, after double-clicking your ARFF-file, you should be doing most of your work in the Classify
tab. Select a classifier, then choose Start
.
Classifiers
Regression
Regression and in particular linear regression is nice because it is easy to interpret because it simply assigns a weight to each of your attributes and uses multiplication and addition of those weights to give an output.
I used your example file and tested it with LinearRegression
, however with such few samples it determined that the best model was simply to output 3.9667
as the rating and that will give you a mean absolute error of 0.4722
.
Not satisfied, next I tried SimpleLinearRegression
, which gives a model -0.02 * activites + 4.13
, and will give a mean absolute error of 0.472
.
SMOreg
gives the following model
weights:
+ 0.1147 * (normalized) lloc
- 0.0404 * (normalized) nid
- 0.1662 * (normalized) nle
- 0.0647 * (normalized) nel
+ 0.3385 * (normalized) nip
- 0.1352 * (normalized) activites
- 0.019 * (normalized) inside-permissions
- 0.0464 * (normalized) outside-permissions
+ 0.1602 * (normalized) all-permissions
+ 0.5921
and has a mean absolute error of 0.3859
. But at this point, I think with such few data points you are overfitting your data.
Nearest neighbor
Using k-nearest neighbors might be a viable approach, if you have more data (in Weka it is called KStar
).
Decision trees
The DecisionStump
algorithm outputs this model with a mean absolute error of 0.3424
, but again probably overfitting.
inside-permissions <= 1.5 : 2.6
inside-permissions > 1.5 : 4.090909090909091
inside-permissions is missing : 3.966666666666667
More data
As you can see, the models and error rates are not so great, considering you only have 12 data points. To build a really good model, you'll need more data. To get an accurate idea of how well the model is doing, you need to not only have enough data to train with, but enough data to keep as a separate test set that you only use for testing the performance of your model.