Question

Using iGraph's community detection function community_fastgreedy() in Python, I got a VertexDendrogram object, let's call it V. Then using V.as_clustering() I got a VertexClustering object from the dendrogram. I know that the communities are clustered so that modularity is maximized, but I think the dendogram object keeps merging all the way, so it's tough to see the communities on the dendrogram.

My question is: How does the output for the V.as_clustering() correspond to distance between the communities in the dendrogram?

In other words, The communities each come with a representative number ( Community [0], Community [2] for example), so how does the that number correspond to position on the dendrogram? Are community 0 and Community 1 adjacent to each other when the dendrogram is merged up to the level that maximizes modularity, and is community 12 farther from community 0 than 3 is?

If not, then how is the order (the numbers for each community) of the output decided by the as_clustering function?

Thanks in advance.

Was it helpful?

Solution

When an algorithm in igraph produces a VertexDendrogram, it may optionally produce a "hint" as well that tells us where to cut the dendrogram (i.e. after how many merges) to obtain a VertexClustering that is in some sense optimal. For instance, the VertexDendrogram produced by community_fastgreedy() proposes that the dendrogram should be cut at the point where the modularity is maximized. Running as_clustering() on a VertexDendrogram simply uses the hint produced by the clustering algorithm to flatten the dendrogram into a clustering, but you may override this by specifying the desired number of clusters as an argument to as_clustering().

As for the "distance" between two communities: it's a complicated thing because most community detection methods don't give you that information. They simply produce a sequence of merges from individual vertices up to a mega-community encapsulating everyone, and there is no "distance" information encoded in the dendrogram; in other words, the branches of the dendrogram have no "length". The best you can do is probably to go back to your graph and check the edge density between the communities; this could be a good indication of closeness. For example:

cl = g.community_fastgreedy().as_clustering()
comm1 = cl[0]
comm2 = cl[1]
edges_between = g.es.select(_between=(comm1, comm2))
print 2.0 * len(edges_between) / len(comm1) * len(comm2)

If your graph is directed, use a multiplier of 1.0 instead of 2.0 in the last line.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top