How to further improve the kaggle titanic submission accuracy?
-
16-10-2019 - |
Question
I am working on the Titanic dataset. So far my submission has 0.78 score using soft majority voting with logistic regression and random forest. As for the features, I used Pclass, Age, SibSp, Parch, Fare, Sex, Embarked.
My question is how to further boost the score for this classification problem?
One thing I tried is to add more classifiers for the majority voting, but it does not help, it even worthens the result. How do I understand this worthening effect?
Thanks for your insight.
Solution
big question.
Ok so here's a few things I'd look at if I was you..
- Have you tried any feature engineering ?(it sounds like you've just used the features in the training set but I can't be 100%)
- Random Forests should do pretty well, but maybe try xgboost too? It's quite good at everything on Kaggle. SVM's could be worth a go also if you're thinking of stacking/ensembling.
- Check out some of the tutorials around this competition. There's hundreds of them and most of them are great.
Links:
...Hopefully this helps
OTHER TIPS
Okay, I am currently at 0.81340 in the competiton. And I will just clear out certain things. I would suggest that you try feature engineering before you go for ensemble methods. There are actually quite a decent tutorials as mentioned before. One can actually score upto at least 0.82 just relying on feature engineering and a ten-fold cross validated RandomForest. Certain things for you to ponder at:
- Look at the Age, what other information does it give you.
- Do SibSp and Parch actually represent different things?
- Can you get something out of the Name of the passengers?
All the Best.
Cheers.