I am trying to select important features (or at least understand which features explain more variabilty) for a given dataset. Towards this I use both ExtraTreesClassifier and GradientBoostingRegressor - and then use :-
clf = ExtraTreesClassifier(n_estimators=10,max_features='auto',random_state=0) # stops after 10 estimation passes, right ?
clf.fit(x_train, y_train)
feature_importance=clf.feature_importances_ # does NOT work - returns NoneType for feature_importance
Post this I am really interested in plotting them(for visual representation) - or even preliminary, just looking at the relative order of importance and the corresponding indices
# Both of these do not work as the feature_importance is of NoneType
feature_importance = 100.0 * (feature_importance / feature_importance.max())
indices = numpy.argsort(feature_importance)[::-1]
What I found puzzling was - if I were to use GradientBoostingRegressor as below, I do get the feature_importance and the indices thereof. What am I doing wrong ?
#Works with GradientBoostingRegressor
params = {'n_estimators': 100, 'max_depth': 3, 'learning_rate': 0.1, 'loss': 'lad'}
clf = GradientBoostingRegressor(**params).fit(x_train, y_train)
clf.fit(x_train, y_train)
feature_importance=clf.feature_importances_
other info : I have 12 independent vars(x_train) and one label var(y_train)) with multiple values (say 4,5,7) and type(x_train) is and type(feature_importance) is
Acknowledgments : Some elements are borrowed from this post http://www.tonicebrian.com/2012/11/05/training-gradient-boosting-trees-with-python/