Data structure for representing Decision Tree Induction

https://stackoverflow.com/questions/14290656

15-01-2022
|

Question

Currently, I've been involved in some projects related to Data Mining. And, I've to classify the given data sets (.csv format) into different classes by using decision tree induction with GINIsplit as the splitting criterion. All these I've been doing in java platform not using any tools e.g. WEKA, ORANGE...etc.

My query is- what can be the best data structure to represent the decision tree so that the classification would be fast and efficient ? And, are there any optimization techniques for attribute-wise, I mean, specific techniques if the attributes are nominal or numeric or ordinal?

Thanks in advance!

Solution

Well, if you really want to have the optimal classification speed, output your decision tree to... .class. I.e. generate a code snippet for the tree, and compile it. This way, evaluation can be executed with the native speed of your Java Hotspot JRE.

Because you can encode a decision tree in program logic:

if (attribute_x < 0.1) {
    switch(attribute_c) {
        case BANANA: {
            ...

The main question is, how far you want to take this optimizations.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow