The term multiclass only affects the target variable: for the random forest in scikit-learn it is either categorical with an integer coding for multiclass classification or continuous for regression.
"Greater-than" rules apply to the input variables independently of the kind of target variable. If you have categorical input variables with a low dimensionality (e.g. less than a couple of tens of possible values) then it might be beneficial to use a one-hot-encoding for those. See:
- OneHotEncoder if your categories are encoded as integers,
- DictVectorizer if your categories are encoded as string labels in a list of python dict.
If some of the categorical variables have a high cardinality (e.g. thousands of possible values or more) then it has been shown experimentally that DecisionTreeClassifier
s and better models based on them such as RandomForestClassifier
s can be trained directly on the raw integer coding without converting it to a one-hot-encoding that would waste memory or model size.