Question

SPSS Modeler has an implementation of QUEST, along with C&RT, C5.0 and CHAID. QUEST is relatively rarely covered in textbooks - what are its pros and cons compared to other decision tree algorithms? How does it make splits? Why is it (apparently) not in as widespread use as C&RT or C5.0?

Was it helpful?

Solution

QUEST stands for Quick, Unbiased and Efficient Statistical Tree.

It uses ANOVA F and contingency table Chi Square tests to select variables for splitting. Variables with multiple classes are merged into two super-classes to get binary splits which are determined using QDA (Quadratic Discriminant analysis). The tree can be pruned using the CART algorithm. It can be used for both classification and regression tasks.

Quest first transforms categorical (symbolic) variables into continuous variables by assigning discriminant coordinates to categories of the predictor. Then it applies quadratic discriminant analysis (QDA) to determine the split point. Notice that QDA usually produces two cut-off points—choose the one that is closer to the sample mean of the first superclass.

An advantage of the QUEST tree algorithm is that it is not biased in split-variable selection, unlike CART which is biased towards selecting split-variables which allow more splits, and those which have more missing values.

Not sure why it is not used as widely as CART or C5.0. It might be due to greater coverage of CART/C5.0 in literature than others.

References:

  1. Quest reference manual: http://www.stat.wisc.edu/~loh/treeprogs/guide/guideman.pdf
  2. http://www.stat.wisc.edu/~loh/quest.html
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top