Question

I need a suggestion on how to do analyze this type of data. I want to perform a sentiment analysis or linear regression on it as a machine learning tool. The predictor is score.

color   type    make    new score

red     truck   ford    y   2
black   sedan   chevy   n   4
silver  sedan   nissan  y   5
silver  truck   nissan  n   2
black   coupe   toyota  y   1
blue    van     honda   y   1
red     truck   toyota  n   4
red     coupe   ford    n   2
black   sedan   ford    y   1
blue    truck   toyota  y   4
white   coupe   chevy   y   3
white   van     toyota  n   5
red     van     ford    y   2
silver  truck   nissan  n   3
black   sedan   honda   n   1
silver  truck   chevy   y   4
red     truck   chevy   y   5
white   coupe   honda   n   5
blue    sedan   chevy   n   2
blue    van     nissan  y   3

I can run a LinearRegression classifier in WEKA which yields:

score =  1.6 ( color=red,silver,white) + 1.8 (make=honda,nissan,toyota,chevy) + 0.55

However, I would like to implement this in Django for a web app. Is there another way to process this data and yield a linear regression equation not using WEKA. Any other suggestions on how to analyze it other than linear regression? I've already implemented a decision tree.

Was it helpful?

Solution

You can use scikit-learn as your machine learning library, and particularly its linear regression capability. This example might also be useful.

Also, you can always bind the Weka java API to your application, or alternatively implement linear regression on your own, it is fairly easy algorithm to implement given a matrix algebra library.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top