Calculate overlapping times inside a map C++

Question 1

I would use a naive approach first, using a side-table; it will be close to a sweeping line algorithm.

The idea:

iterate through the connections in order (ordered by start time here)
keeping a set of the currently involved connections

The trick is to keep track of the connections efficiently, and especially of when to remove them; a priority-queue is great for this.

Some help:

struct Connection {
    uint32_t start;
    uint32_t end;
}; // struct Connection

// WATCH OUT:
// std::priority_queue only let you access the MAXIMUM element, so the predicate
// is the OPPOSITE of what we usually write...
struct OrderByEnd {
    bool operator()(Connection const& left, Connection const& right) const {
        if (left.end > right.end) { return true; }
        if (left.end < right.end) { return false; }
        return left.start > right.start;
    }
}; // struct OrderByEnd

using CurrentlyOverlappingType = std::priority_queue<Connection, std::deque<Connection>, OrderByEnd>;

And then, we sweep:

size_t countMaxNumberOfOverlaps(std::map<uint32_t, uint32_t> const& connections) {
    if (connections.empty()) { return 0; }

    size_t max = 0;
    CurrentlyOverlappingType currentlyOverlapping;

    for (auto const& con: connections) {
        // Purge no longer overlapping connections
        while (not currentlyOverlapping.empty() and currentlyOverlapping.top().end < con.first) {
            currentlyOverlapping.pop();
        }

        // Debug:
        if (not currentlyOverlapping.empty()) {
            std::cout << "[" << con.first << ", " << con.second <<
                "] is overlapping with: " << currentlyOverlapping.size() << " connections\n";
        }

        // The current connection is necessarily overlapping with itself
        currentlyOverlapping.push(Connection{con.first, con.second});

        max = std::max(max, currentlyOverlapping.size());
    }

    return max;
} // countMaxNumberOfOverlaps

And it works as expected:

[120, 200] is overlapping with: 1 connections
[250, 400] is overlapping with: 1 connections
[600, 800] is overlapping with: 1 connections
Max number of overlaps: 2

The complexity is harder to fathom. You have to sweep through the whole set of connections, but at each step the amount of work is proportional to the current number of overlapping connections...

Worst case complexity: O(N * log(N)) I would say (because inserting in the priority queue is logarithmic)
Average case complexity: O(N * log(O)) where O is the number of overlapping connections

Note: in the algorithmic analysis, I considered that the complexity of the purge part is amortized constant; we know items are going to be purged, so we can account for their purge in the cost of their insertion. N items are inserted, N items are going to be purged. The number of comparisons executed as part of the purge process is (I think) also amortized constant (the total for all items being bounded by a maximum of 2N) although it's intuition and not computation. And therefore, the cost of the purge is dwarfed by the cost of inserting the items in the priority queue, which log(O) per item.

Question 2

I think your problem is one of line segment intersection (http://en.wikipedia.org/wiki/Bentley-Ottmann_algorithm), if you view time intervals as 1-dimensional line segments (rather than the usual 2-dimensional line segments in Euclidean plane).

Algorithmically, you can use one of the sweep-line algorithms in computational geometry to achieve complexity that is less than O(n^2) associated with the naive approach. The Bentley–Ottmann algorithm above is one of them. Conceptually, you maintain a sweep line scanning from left to right, keep a record of your scan-line entering or leaving intervals. Details can be found in any good computational geometry text books.

I don't think std::map support sweep line algorithms, and you will need to implement the necessary data structure associated with it.

Question 3

You could use Boost's Date/Time library. It has a time_period class which supports an intersects method as shown in these examples. This removes the need for you to write code for 'do these two intervals overlap?' although you will still have to decide how to traverse your collection (avoiding an O(n^2) traverse seems difficult).

EDIT : you need the maximum number of concurrent intervals at any time. In which case, Ting L's suggestion of a sweep-line algorithm seems best. Sort the intervals by start time and iterate over them in order. Maintain a stack as you iterate along. For each new interval, pop any intervals on the stack that have expired before the new interval starts. Keeping track of the maximum size of the stack will give you the maximum number of concurrent intervals (if you need to know which intervals they are, you will have to maintain a separate container 'current longest set' updated as you iterate)

Question 4

Map (std map) is a tree sorted by key. Check this link. So if you iterate through, then you would get it by order of start-time. However, beyond that to check for specific overlapping time, there would be no additional support from the std::map api.

Now to check the overlap time, you have to do classical iterator approach as the worse case method. Check the end time of a element i, is always less than the start time of i to N .

HTH!

Question 5

I think the nature of the problem is connected with sets and sets intersections. You can try STL set container and set_intersection function which effectively finds intersections of two sorted sequences. (Really, set is just a map with only keys (without values) so you can work with maps, they are sorted too.)

Say, you can represent time interval as a pair(start, end) and build an intersection of all intervals intersecting sets. You can define your own Compare function. The exact logic would follow from the details of your task, which are not fully clear.

(There are also set_difference, set_union, and set_symmetric_difference functions.)