Question

I have a heuristic of the following format:

if a == 1 and b == 1 and c == 1:
    do something
if a == 1 and b == 1 and c == 2:
    do something
if a == 3 and b == 2 and c == 1:
    do something
if a == 2 and b == 2 and c == 1:
    do something
if a == 3 and b == 1 and c == 3:
    do something
etc.

Of course, this makes many unnecessary comparisons which could be avoided if the statements were nested like this:

if a == 1:
    if b == 1:
        if c == 1:
            do something
        if c == 2:
            do something
etc.

Assuming the set of tuples (a, b, c) for which I have a case is finite, and each one has the same likelihood of being received by the algorithm, how can I generate a decision tree which is optimal, i.e., that it makes the least number of comparisons for the general case, or that minimizes the total number of comparisons when all the inputs are run through it?

I imagine something like this:

In: optimal_tree([(1, 1, 1), (1, 1, 2)])
Out: "if a == 1:
          if b == 1:
              if c == 1:
                  do something
              if c == 2:
                  do something"

In: optimal_tree([(1, 1), (2, 1), (2, 2)])
Out: "if a == 2:
          if b == 1:
              do something
          if b == 2:
              do something
      if a == 1:
          do something"

     OR

     "if b == 1:
          if a == 1:
              do something
          if a == 2:
              do something
      if b == 2:
          do something"
Was it helpful?

Solution

Rule engines and database queries regularly deal with this problem too. You might want to look into the implementation behind those.

They have several ways to deal with this (although none is perfect):

  • Do the comparisons in the order defined in the Rule/Query.
  • Reorder the comparison based on statistical information. Databases will for example to first handle the indexed columns with the most value variance.

If you want to make your algorithm faster, you might want to look into hashing & indexing if you're not doing that already. That brings in a much bigger scale advantage.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top