Graph data structure memory management

Question 1

You will need to perform a search to figure out which node is still connected to the input edge, when you remove a component. If you end up with more than one connected group, you will need to figure out which one of these contains the entry node and remove all others.

No greedy (local) algorithm for this can exist, which can be shown by a simple thought experiment:

Let A, B be subgraphs connected only through the node n, which shall be removed. We are left with two unconnected subgraphs. There is no way of knowing (without a whole bunch of state per node) if we have just removed the only route to the entry node for A or B. And, it is necessary to figure that out, so that the appropriate choice of removing either A or B can be made.

Even if every node stored every single route to the entry node, it would mean you have to clean up all routes in all nodes whenever you remove a single node.

Solution Sketch

Let us talk about a graphical representation of what we need to do:

First, Color the node that is being deleted black. Then perform the following for every node we encounter:

For uncolored nodes:

If the node we came from is black, give this node a new color
If the node we came from is colored, give this node the same color
Travel through every outgoing edge

For colored nodes:

If the node we came from is black, just return
If the node we came from is the same color, just return
If the node we came from has a different color, merge the two colors (e.g. by remembering that green and blue are the same, or by painting every green node blue)
Travel through every outgoing edge

At the end we will know which connected components will exist after we delete the current node. All connected components (plus our original to be deleted node) which do not contain the entry node must be deleted (Note: This may delete every single node, if our to-be-deleted node was the entry node...)

Implementation

You will need a data structure like the following:

struct cleanup {
    vector<set<node*>> colors;
    node* to_be_deleted;
    size_t entry_component;
};

The index into the vector of lists will be your "color". The "color black" will be represented by usage of to_be_deleted. Finally, the entry_component will contain the index of the color that has the entry node.

Now, the previous algorithm can be implemented. There are quite a few things to consider, and the implementation may end up being different, depending on what kind of support structures you already keep for other operations.

Question 2

The answer depends on the complexity of the graph:

If the graph is a tree, each parent can own its children and delete them in its destructor.
If the graph is a directed acyclic graph, an easy and performant way to handle it is to do reference counting on the nodes.
If the graph can be cyclic, you are out of luck. You will need to keep track of each and every node in your graph, and then do garbage collection. Depending on your use case, you can either do the collection by
- cleaning up everything when you are done with the complete graph, or by
- repeatedly marking all connected nodes and cleaning up all the unreachable ones.

If there is any possibility to get away with option 1 or 2 (possibly tweaking the problem to ensure that the graph fulfills the constraint), you should do so; option 3 implies significant overheads in terms of code complexity and runtime.

Question 3

There are a couple of ways. One way is to make your nodes know what other nodes have edges to it. So, if you delete C from B, C will need to remove the edge to it from A. So later when you remove/delete A, it won't try to delete C.

std::shared_ptr or some other type of reference counting may also work for you.

Question 4

Here's a simple way to avoiding memory problems when implementing a graph: Don't use pointers to represent edges.

Instead, give each node a unique ID number (an incrementing integer counter will suffice). Keep a global unordered_map<int, shared_ptr<Node> > so that you can quickly look up any Node by its ID number. Then each Node can represent its edges as a set of integer Node IDs.

After you delete a Node (i.e. remove it from the global map of Nodes), it's possible that some other Nodes will now have "dangling edges", but that will be easy to detect and handle because when you go to look up the now-removed Node's ID in your global map, the lookup will fail. You can then gracefully respond by ignoring that edge, or by removing that edge its the source Node, or etc.

The advantages of doing it this way: The code remains very simple, and there is no need to worry about reference-cycles, memory leaks, or double-frees.

The disadvantages: It's a little bit less efficient to traverse the graph (since doing a map lookup takes more cycles than a simple pointer dereference) and (depending on what you are doing) the 'dangling edges' might require occasional cleanup sweeps (but those are easy enough to do... just iterate over the global map, and for each node, check each edge in its edge-set and remove the ones with IDs that aren't present in the global map)

Update: If you don't like doing a lot of unordered_map lookups, you could alternatively get very similar functionality by representing your edges using weak_ptr instead. A weak_ptr will automagically become NULL/invalid when the object it is pointing at goes away.