Question

I am working on the Titanic dataset. So far my submission has 0.78 score using soft majority voting with logistic regression and random forest. As for the features, I used Pclass, Age, SibSp, Parch, Fare, Sex, Embarked.

My question is how to further boost the score for this classification problem?

One thing I tried is to add more classifiers for the majority voting, but it does not help, it even worthens the result. How do I understand this worthening effect?

Thanks for your insight.

Was it helpful?

Solution

big question.

Ok so here's a few things I'd look at if I was you..

  1. Have you tried any feature engineering ?(it sounds like you've just used the features in the training set but I can't be 100%)
  2. Random Forests should do pretty well, but maybe try xgboost too? It's quite good at everything on Kaggle. SVM's could be worth a go also if you're thinking of stacking/ensembling.
  3. Check out some of the tutorials around this competition. There's hundreds of them and most of them are great.

Links:

R #1 (my favourite)

R #2

Python #1

Python #2

...Hopefully this helps

OTHER TIPS

Okay, I am currently at 0.81340 in the competiton. And I will just clear out certain things. I would suggest that you try feature engineering before you go for ensemble methods. There are actually quite a decent tutorials as mentioned before. One can actually score upto at least 0.82 just relying on feature engineering and a ten-fold cross validated RandomForest. Certain things for you to ponder at:

  • Look at the Age, what other information does it give you.
  • Do SibSp and Parch actually represent different things?
  • Can you get something out of the Name of the passengers?

All the Best.
Cheers.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top