Question

I have a tree of foods, which I'd like to search to find relevant nodes. What's the best way to rank the results?

Here's an example sub-set after a search for "apple":

|-- apple pie
|-+ apples
| |-- cooking
| |-+ eating
|   |-- average
|   |-- Granny Smith
|   |-- Golden Delicious
|-- pork and apple casserole

Currently, I'm ranking based on which results come earliest in the full name of each leaf, so a result set would look like:

0 Apple pie
0 Apples, cooking
0 Apples, eating, average
0 Apples, eating, Granny Smith
0 Apples, eating, Golden Delicious
2 Pork and apple casserole

Where the rank is the index of the first token match, lower the rank the better.

I'd like to aggregate the matches, so that I don't display the full sub-tree on the initial search, for example:

Apples... (4)
Apple pie
Pork and apple casserole

The obvious way to rank these is to count the number of matching leaves, then the higher the rank, the better the match.

But, I'm not sure how to combine these rankings, since one is larger-is-better and one is smaller-is-better. Are there standard ways to combine rankings like this? (I'm not sure what to search for, so Google is giving me results about search engine optimisation, and searching web pages, which doesn't seem to apply.)

Was it helpful?

Solution

Although i dont really know how a result is better. I guess what you are doing is fine (And perhaps the best possibly you can get). Just one note to be more precise, you should be ranking a node $u$ based on the size of the subtree rooted at $u$ (not on the number of leaves in that subtree, or children of $u$). This is better (and more general).

But one way to solve issues like this in general is to give score to each node in your tree (a weight). The higher the score, the more it will be ranked above. Then, you will be looking for the subtree with highest weights. You compute this by summing the weights of all nodes in a subtree.

Example of weights (just to help you a bet):

  • closeness to search term (use hamming distance for example, or other metrics [e.g. look how spelling suggestion algorithms work.]
  • If this is a website of recipes for example, the nodes may be recipes and the weight of a node is how many "likes" it received from the website.

In conclusion, I guess what you are doing is the best approximation to have. Adding the weights is a heuristic to give you perhaps better results.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top