Predict Google Play ratings with Weka

Question

Class type

First, I would suggest that you make you change your class type to numeric, if you would like your output to be continuous. Otherwise, I'd suggest keeping the class label type as nominal (as you have it now) but change your ratings to {1, 2, 3, 4, 5}.

If you change to a numeric output (so you can give a prediction of 4.5 stars, for example), then you'll need to use a classifier that is capable of a numeric class.

Using Weka

I'd suggest taking a look at the documentation to learn more about using Weka, possibly by going through some tutorials. For example, after double-clicking your ARFF-file, you should be doing most of your work in the Classify tab. Select a classifier, then choose Start.

Classifiers

Regression

Regression and in particular linear regression is nice because it is easy to interpret because it simply assigns a weight to each of your attributes and uses multiplication and addition of those weights to give an output.

I used your example file and tested it with LinearRegression, however with such few samples it determined that the best model was simply to output 3.9667 as the rating and that will give you a mean absolute error of 0.4722.

Not satisfied, next I tried SimpleLinearRegression, which gives a model -0.02 * activites + 4.13, and will give a mean absolute error of 0.472.

SMOreg gives the following model

weights:
 +       0.1147 * (normalized) lloc
 -       0.0404 * (normalized) nid
 -       0.1662 * (normalized) nle
 -       0.0647 * (normalized) nel
 +       0.3385 * (normalized) nip
 -       0.1352 * (normalized) activites
 -       0.019  * (normalized) inside-permissions
 -       0.0464 * (normalized) outside-permissions
 +       0.1602 * (normalized) all-permissions
 +       0.5921

and has a mean absolute error of 0.3859. But at this point, I think with such few data points you are overfitting your data.

Nearest neighbor

Using k-nearest neighbors might be a viable approach, if you have more data (in Weka it is called KStar).

Decision trees

The DecisionStump algorithm outputs this model with a mean absolute error of 0.3424, but again probably overfitting.

inside-permissions <= 1.5 : 2.6
inside-permissions > 1.5 : 4.090909090909091
inside-permissions is missing : 3.966666666666667

More data

As you can see, the models and error rates are not so great, considering you only have 12 data points. To build a really good model, you'll need more data. To get an accurate idea of how well the model is doing, you need to not only have enough data to train with, but enough data to keep as a separate test set that you only use for testing the performance of your model.