Question

I am attempting to do some graph analysis on the Dewey Decimal Classification, so that I can make a distance between two books. The DDC has several relations: "hierarchy", "see-also", "class-elsewhere", here I represent them with different colours. Since these relations are not symmetric you will notice we have a Directed Graph. Below is a picture of the graph of all vertices maximum 4 edges away from 394.1.

Sample Graph

The distance metric between classifications A and B, should be the shortest path between A and B. However the colours have no inherent weighted value or preference. But the user will give provide one. So given a dictionary of weights, example:

weights_dict_test = {'notational_hiearchy':1,
                'see_reference':0.5, 
                'class_elsewhere':2}

I would like to return the weighted shortest path. I thought that would not be a problem if I could preprocess all simple paths between the two nodes, and then find which was the shortest given the weights dict. However since the graph contains >50,000 nodes. Computing nx.all_simple_paths(G, a, b) has not returned after 24 hours of computation. Are there any suggestions for parallelizing finding all_simple_paths. Or a technique to compute the shortest path given the weights_dict that does not involve computing all_simple_paths?

Was it helpful?

Solution

Thanks to @CorleyBrigman. The solution is to create a new Graph W from G, replacing the colours of G with weights you desire. Then you can efficiently use the nx.shortest_path and nx.shortest_path_length with their typically fast runtimes.

In [23]:

def update_weights(G, weights_dict):    
    W = nx.DiGraph()

    for m in G.nodes():
        for n in G[m].iterkeys():
            relation = G[m][n]['rel']
            weight = weights_dict[relation]    
            W.add_edge(m, n, rel=weights_dict[relation])            
    return W

In [41]:

weights_dict_test = {'notational_hiearchy':50,
                'see_reference':0.6, 
                'class_elsewhere':1}

In [42]:

W = update_weights(G, weights_dict_test)

In [43]:

print len(W)
print len(G)

43241    
43241

In [45]:

nx.shortest_path_length(W, '394.1', '341.33',weight='rel')

Out[45]:

52.2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top