So this is from my own code I used to do some prediction on StackOverflow last year:
from __future__ import division
from pandas import *
from sklearn import cross_validation
from sklearn import metrics
from sklearn.ensemble import GradientBoostingClassifier
basic_feature_names = [ 'BodyLength'
, 'NumTags'
, 'OwnerUndeletedAnswerCountAtPostTime'
, 'ReputationAtPostCreation'
, 'TitleLength'
, 'UserAge' ]
fea = # extract the features - removed for brevity
# construct our classifier
clf = GradientBoostingClassifier(n_estimators=num_estimators, random_state=0)
# now fit
clf.fit(fea[basic_feature_names], orig_data['OpenStatusMod'].values)
# now
priv_fea = # this was my test dataset
# now calculate the predicted classes
pred = clf.predict(priv_fea[basic_feature_names])
So if we wanted a subset of the features for classification I could have done this:
# want to train using fewer features so remove 'BodyLength'
basic_feature_names.remove('BodyLength')
clf.fit(fea[basic_feature_names], orig_data['OpenStatusMod'].values)
So the idea here is that a list can be used to select a subset of the columns in the pandas dataframe, as such we can construct a new list or remove a value and use this for selection
I'm not sure how you could do this easily using numpy arrays as indexing is done differently.