Question

I have an enormous graph dataset - let's say it is like this, but on a much bigger level:

1 -> 2
3 -> 4

1,2,3,4 are nodes and the arrows are directed edges. Let's say that they are all in a single graph object:

import networkx as nx
G = nx.DiGraph()
G.add_nodes_from([1,2,3,4])
G.add_edge(1,2)
G.add_edge(3,4)

Given an object like this, which has two mini graphs within a graph, how can we pull out each mini graph? I feel like there must be some word for this? My end result would look like:

for mini_graph in G:
    print mini_graph.nodes()

...
[1,2]
[3,4]
Was it helpful?

Solution 2

If the parts of the graph are truly disjoint (as per your small example), then consider extracting the subgraphs with connected_component_subgraphs().

This only works on an undirected graph, so if you are using a directed graph then you'll need to convert to undirected first.

import networkx as nx
G = nx.DiGraph()
G.add_nodes_from([1,2,3,4])
G.add_edge(1,2)
G.add_edge(3,4)

# make an undirected copy of the digraph
UG = G.to_undirected()

# extract subgraphs
sub_graphs = nx.connected_component_subgraphs(UG)

for i, sg in enumerate(sub_graphs):
    print "subgraph {} has {} nodes".format(i, sg.number_of_nodes())
    print "\tNodes:", sg.nodes(data=True)
    print "\tEdges:", sg.edges()

which yields:

subgraph 1 has 2 nodes
    Nodes: [(1, {}), (2, {})]
    Edges: [(1, 2)]
subgraph 1 has 2 nodes
    Nodes: [(3, {}), (4, {})]
    Edges: [(3, 4)]

and you could use the subgraph node labels to operate on your data in the initial graph,

sg.nodes()[0] in G
>>>  True

Reading the answer linked by EdChum, it appears that weakly_connected_component_subgraphs() operates on a directed graph but treats it as undirected, so saving the copy might be crucial. However, the docs on this and the related function weakly_connected_components() are a bit thin at present.

OTHER TIPS

As of 2018 the above answer is deprecated (link to docs). You are advised to use:

(G.subgraph(c) for c in connected_components(G))

or

(G.subgraph(c).copy() for c in connected_components(G))

As previous answers are made for undirected graphs, we will lose vital information of the direction, because of the conversion to an undirected graph. I've had the same issue and finally the method weakly_connected_components did it.

>>> G = nx.DiGraph()
>>> G.add_nodes_from([1,2,3,4])
>>> G.add_edge(1,2)
>>> G.add_edge(3,4)
>>> list(nx.weakly_connected_components(G))
[{1, 2}, {3, 4}]

It works with directed graphs and its performance is quite decent. If you like to split your graph and continue computation (like me), then you can also build subgraphs of the result above with:

h = nx.weakly_connected_component_subgraphs(G)

j = []
for foo in h:
    j.append(foo)

(Very explicit, to show how this can be accessed). For whatever reason, h seems to be destroyed by listing it?! The above way is stable instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top