Question

I have implemented the algorithm by Zhang and Shasha to calculate the minimal edit distance between two trees. Everything is working fine and I am very pleased with the current running times.

Now I would also like to generate a diff that highlights the changed / deleted / inserted nodes. According to their paper, it is very natural to ask for the mapping that yielded the computed distance and according to the last slides of this presentation it seems that the mapping can be easily extracted from the last forest distance table and the tree distance table. Unfortunately I haven't been able to figure out the exact rules yet.

Any additional description would be highly appreciated. Many thanks!

Was it helpful?

Solution

Ok, I have finally figured it out on my own. In order to generate the optimal mapping for the nodes of two trees, with m and n nodes respectively, you need to do some backtracking in the forest tables.

For each field (x, y) starting with (m, n) of the (m, n) forest distance table, do the following: if the minimum was obtained by summing up the field (x', y') and the edit / delete / insert costs, then write down the mapping and go to the field (x', y') of the current forest distance table. On the other hand, if the minimum was obtained by summing up the field (x', y') from the current forest distance table and the field (tx, ty) from the tree distance table, then go to the field (x', y') from the current forest distance table AND to the field (tx, ty) from the forest table corresponding to the tree (tx, ty). You now need to continue the backtracking in both forest tables individually and collect the mappings from both.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top