Frage

Say, there's a tree data structure, each leaf of which defines a set of keys for lookup:

*
|- A = 1, B = 2
|- *
   |- C = 4
   |- *
      |- D = 5
      |- D = 6, E = 7

I need a way of finding the value of the key for any given leaf while the tree is being traversed depth-first.

There are two approaches I have thought of:

  1. If the value is not found in current leaf, check the dictionary of its parent and so on back to the root of the tree.

  2. There is a global dictionary and each leaf inserts/removes its keys when being traversed. The lookup in performed in this global dictionary.

It's most likely that there will be many leafs with a few keys in each, and about 3-4 lookups for each key.

Which approach is more efficient? Or, maybe, there's another way of doing it that is better than both?

War es hilfreich?

Lösung

The programming language you are implementing will for sure define exact rules for name resolution. I don't think it will lead to a depth-first search. Name resolution rules, very fruequently, look somethink like this:

  1. search the current scope, frequently considering only what has been declared “up” from the current position in the source code;
  2. when some specific rules are met, e.g. there is some form of a using / import or whatever other construct linking to some other scope, perform a search in that other scope (all of such scopes, sequentially), and recurse within it:
    1. search the given scope,
    2. recurse for any relevant nested scopes;
  3. go to the immediatelly enclosing scope;
  4. repeat from (1), where the 'current' scope is that determined in (3).

In other words, you gradually go up in the tree of enclosing scopes, and decide whether to search any 'foreign' referenced scopes. Statements such as using / import lead to references among scopes, which in turn cause what is viewed as a tree of scopes to be in fact a directed graph.

Regarding the lookup table construction, I'd start with a simple hashtable. Prefix trees (tries) work well for these scenarios, too.

Last but not least, I wouldn't care much of lookup performance unless I'd face a real performance problem when compiling tens or maybe hundreds of thousands of lines of code.

Andere Tipps

For a reasonably performant solution, use functional/persistent data structures.

They will have signatures like

insert :: Map -> key -> value -> Map
delete :: Map -> key -> Map

and so on. That is, each operation returns a new map with the operation performed on it, but the old map is also still valid. For tree-based maps this can be done with only a constant factor overhead; so operations will still run in log(n) time. (The main technique is path copying, FWIW.)

The way to gainfully use them is this: every time you encounter a sub-scope, keep the state of the parent scope around while evolving the state of the child scope as you encounter variables. Once you're done with the sub-scope, revert back to using the map that matches the parent scope. (Always do lookups in the most current, child-most map.)

The branch of research you want to look to, for better answers, is Persistent Data Structures. Here's Erik DeMaine giving a lecture about the topic, in case you want to familiarize yourself: https://www.youtube.com/watch?v=T0yzrZL1py0&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf&index=1

With log(n) time map operations I figure your whole thing will run in O(n log n) time. I wish I knew of a way to make it run in linear time, but I don't. I do not, for example, know about a persistent array-based hash map.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top