Question

I want to use K-nearest neighbor approach to retrieve closest tree to input tree from data set. The node in tree have value but branches in each tree do not have label.

for example:

Tree 1: (S (V c) (N (P y)) (V (V o) (N (D t) (N d))))

Tree 2: (S (V (V p) (P (R o)) (N (D t) (N d))))

I want to use k-nearest neighbor for this problem. Are you have any idea to use this approach for this problem?

Was it helpful?

Solution

You have to define a distance measure for trees to apply KNN algorithm. There are many possible metrices for trees to use, one of the most popular choices is tree edit distance ( How do I calculate tree edit distance? )

KNN is not a search algorithm. It is not used for finding the closest object, but rather - classifing object to one of the predefined labels. It simply searches for K nearest neighbours of given point X, and returns a label which have most of the neighbours.

For finding a closest tree, assuming that you already have defined the TED (tree edit distance) would be simply iterate through all tree in trees and select the one that minimizes TED(tree,X).

Very good resource regarding Tree Edit Distance is also located here: http://www.inf.unibz.it/dis/projects/tree-edit-distance/tree-edit-distance.php

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top