How to further improve the kaggle titanic submission accuracy?

https://datascience.stackexchange.com/questions/13104

16-10-2019
|

Question

I am working on the Titanic dataset. So far my submission has 0.78 score using soft majority voting with logistic regression and random forest. As for the features, I used Pclass, Age, SibSp, Parch, Fare, Sex, Embarked.

My question is how to further boost the score for this classification problem?

One thing I tried is to add more classifiers for the majority voting, but it does not help, it even worthens the result. How do I understand this worthening effect?

Thanks for your insight.

Solution

big question.

Ok so here's a few things I'd look at if I was you..

Have you tried any feature engineering ?(it sounds like you've just used the features in the training set but I can't be 100%)
Random Forests should do pretty well, but maybe try xgboost too? It's quite good at everything on Kaggle. SVM's could be worth a go also if you're thinking of stacking/ensembling.
Check out some of the tutorials around this competition. There's hundreds of them and most of them are great.

Links:

...Hopefully this helps

OTHER TIPS

Okay, I am currently at 0.81340 in the competiton. And I will just clear out certain things. I would suggest that you try feature engineering before you go for ensemble methods. There are actually quite a decent tutorials as mentioned before. One can actually score upto at least 0.82 just relying on feature engineering and a ten-fold cross validated RandomForest. Certain things for you to ponder at:

Look at the Age, what other information does it give you.
Do SibSp and Parch actually represent different things?
Can you get something out of the Name of the passengers?

All the Best.
Cheers.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange