Clauset-Newman-Moore community detection implementation

Question

So here's the original paper, a neat lil' six pages, only two of which are about the design & implementation. Here's a cliffnotes:

For a partition of a given graph, the authors define the modularity, Q, of the partition to be the ratio of the number of edges within each community to the number of edges between each community, minus the ratio you'd expect from a completely random partition.
So it's effectively "how much better is this partition at defining communities than a completely random one?"
Given two communities i and j of a partition, they then define deltaQ_ij to be how much the modularity of the partition would change if communities i and j were merged. So if deltaQ_ij > 0, merging i and j will improve the modularity of the partition.
Which leads to a simple greedy algorithm: start with every node in its own community. Calculate deltaQ_ij for every pair of communities. Whichever two communities i, j have the largest deltaQ_ij, merge those two. Repeat.
You'll get maximum modularity when the deltaQ_ij all turn negative, but in the paper the authors let the algorithm run until there's only one community left.

That's pretty much it for understanding the algorithm. The details are in how to compute deltaQ_ij quickly and store the information efficiently.

Edit: Data structure time!

So first off, I think the implementation you're referencing does things a different way to the paper. I'm not quite sure how, because the code is impenetrable, but it seems to use union-find and hashsets in place of the author's binary trees and multiple heaps. Not a clue why they do it a different way. You might want to email the guy who wrote it and ask.

Anyway, the algorithm in the paper needs several things from the format deltaQ is stored in:

First, it needs to be able to recover the largest value in dQ quickly.
Second, it needs to be able to remove all deltaQ_ik and deltaQ_ki for a fixed i quickly.
Third, it needs to be able to update all deltaQ_kj and deltaQ_jk for a fixed j quickly.

The solution the authors come up to for this is as follows:

For each community i, each non-zero deltaQ_ik is stored in a balanced binary tree, indexed by k (so elements can be found easily), and in a heap (so the maximum for that community can be found easily).
The maximum deltaQ_ik from each community i's heap is then stored in another heap, so that the overall maximums can be found easily.

When community i is merged with community j, several things happen to the binary trees:

First, each element from the ith community is added to the jth community's binary tree. If an element with the same index k already exists, you sum the old and new values.
Second, we update all the remaining "old" values in the jth community's binary tree to reflect the fact that the jth community has just increased in size.
And for each other community's binary tree k, we update any deltaQ_kj.
Finally, the tree for community i is thrown away.

And similarly, several things must happen to the heaps:

First, the heap for community i is thrown away.
Then the heap for community j is rebuilt from scratch using the elements from the community's balanced binary tree.
And for each other community k's heap, the position of entry deltaQ_kj is updated.
Finally, the entry for community i in the overall heap is thrown away (causing bubbling) and the entries for community j and each community k connected to i or j are updated.

Strangely, when two communities are merged there's no reference in the paper as to removing deltaQ_ki values from the kth community's heap or tree. I think this might be dealt with by the setting of a_i = 0, but I don't understand the algorithm well enough to be sure.

Edit: Trying to decipher the implementation you linked. Their primary datastructures are

CmtyIdUF, a union-find structure that keeps track of which nodes are in which community (something that's neglected in the paper, but seems necessary unless you want to reconstruct community membership from a trace of the merge or something),
MxQHeap, a heap to keep track of which deltaQ_ij is largest overall. Strangely, when they update the value of a TFltIntIntTr in the heap, they don't ask the heap to re-heapify itself. This is worrying. Does it do it automatically or something?
CmtyQH, a hashmap that maps a community ID i to a structure TCmtyDat which holds what looks heap of the deltaQ_ik for that community. I think. Strangely though, the UpdateMaxQ of the TCmtyDat structure takes linear time, obviating any need for a heap. What's more, the UpdateMaxQ method only appears to be called when an element of the heap is deleted. It should definitely also be getting called when the value of any element in the heap is updated.