Question

I'm attempting to determine a suitable way to accomplish the following.

I would like to have range -> set lookup within a particular range (say [0x0 - 0xffffffff]). Values are inserted into the range at ranges (so if we are working with T = unique strings), I might insert("beef3490", [0x1234, 0xFFFF]); and a single id may have multiple inserts, which may or may not overlap. Additionally, other values may be inserted which overlap, and where they overlap I should receive the set of unique strings as a result. Lastly, values can also be removed from ranges (not necessarily matching, but typically contained within their initial insert range).

Here is a simplified example usage:

insert("beef3490", [0x1234, 0xFFFF])
insert("beef3490", [0x11111, 0x22222])
insert("flam1456", [0x8000, 0xA0000])
remove("beef3490", [0x21000, 0x22000])
lookup(0x0) -> set<>
lookup(0x2000) -> set<beef3490>
lookup(0x9456) -> set<beef3490, flam1456>
lookup(0x21212) -> set<flam1456>
lookup(0x99789) -> set<flam1456>

This leads to a number of questions for me:

  1. Is there a generalized name for this sort of problem, or a similar type of problem that I could find insight from?

  2. What is an ideal implementation given the following constraints:

    • Fast/very-fast lookup time
    • Memory usage does not bloat (ie. the following operations have equivalent footprint)
      • Insert [10,20], Insert[20,30], Remove[14,24]
      • Insert [10,15], Insert[25,30]
    • Similar to last, the data structure should have stability on a long running system
    • Insert/delete times are not absolutely terrible (they don't need to be as fast as lookup though)
  3. Given an ideal implementation, do you have advice for using it in C++

Thanks for all responses and help.

Was it helpful?

Solution

Is there a generalized name for this sort of problem, or a similar type of problem that I could find insight from?

This problem is an interval-tree or segment-tree problem. Specifically, the tree/data-structure needs to perform aggregate on overlap operations. This means that when two intersecting ranges are inserted into the structure, they lookup value for a point within both ranges is equivalent to val(range A) + val(range B). The '+' operation could be addition if the values were integers, or in the case of sets it would be a union operation.

What is an ideal implementation given the constraints

A self-balancing Binary Search Tree (such as a Red-black tree, or Scapegoat tree) built to optimize searches based on ranges. Insert operations would additionally intersecting ranges as necessary to produce the correct return values. The ways in which the ranges are split varies by your requirements but is typically either through "joining" which discards the information about the original ranges but has a smaller footprint or "splitting" from which the original ranges can still be deduced.

Given an ideal implementation, do you have advice for using it in C++

See boost::icl. (Boost interval container library)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top