does scikit-lean decision tree support unordered ('enum') multiclass features?

Question 1

The term multiclass only affects the target variable: for the random forest in scikit-learn it is either categorical with an integer coding for multiclass classification or continuous for regression.

"Greater-than" rules apply to the input variables independently of the kind of target variable. If you have categorical input variables with a low dimensionality (e.g. less than a couple of tens of possible values) then it might be beneficial to use a one-hot-encoding for those. See:

OneHotEncoder if your categories are encoded as integers,
DictVectorizer if your categories are encoded as string labels in a list of python dict.

If some of the categorical variables have a high cardinality (e.g. thousands of possible values or more) then it has been shown experimentally that DecisionTreeClassifiers and better models based on them such as RandomForestClassifiers can be trained directly on the raw integer coding without converting it to a one-hot-encoding that would waste memory or model size.

Question 2

DecisionTreeClassifier is certainly capable of multiclass classification. The "greater than" just happens to be illustrated in that link, but arriving at that decision rule is a consequence of the affect it has on the information gain or the gini (see later in that page). Decision tree nodes generally have binary rules, so they typically take the form of some value being greater than another. The trick is transforming your data so it has good predictive values to compare.

To be clear, multiclass means your data (say a document) is to be classified as one of a set of possible classes. This is different from multilabel classification, where the document needs to be classified with several classes out of a set of possible classes. Most of the scikit-learn classifiers support multiclass, and it has a few meta-wrappers to accomplish multilabeling. You can also use probabilities (models with the predict_proba method) or decision function distances (models with the decision_function method) for multilabeling.

If you are saying you need to apply multiple labels to each datum (like ['red','sport','fast'] to cars), then you need to create a unique label for each possible combination to use trees/forests, which becomes your [0...K-1] set of classes. However, it implies that there is some predictive correlation in the data (for combined color, type, and speed in the cars example). For cars, there may be with red/yellow, fast sports cars, but unlikely for other 3-way combinations. Data may be strongly predictive for those few and very weak for the rest. Better off using SVM or LinearSVC and/or wrapping with OneVsRestClassifier or similar.

Question 3

There is a Python package called DecisionTree https://engineering.purdue.edu/kak/distDT/DecisionTree-2.2.2.html which i find very helpful.

This is not directly related to your scikit/sklearn problem, but be helpful to others. Also, I always go to pyindex, when I am looking for python tools. https://pypi.python.org/pypi/pyindex

Thanks