The only workaround for pandas pre-0.15 I found is as follows:
- column must be converted to a Categorical for classifier, but numpy will immediately coerce the levels back to int, losing the factor information
- so store the factor in a global variable outside the dataframe
.
train_LocationNFactor = pd.Categorical.from_array(train['LocationNormalized']) # default order: alphabetical
train['LocationNFactor'] = train_LocationNFactor.labels # insert in dataframe
[UPDATE: pandas 0.15+ added decent support for Categorical]