Question

I'm trying to compute and plot the out- and in-degree distributions for the wikipedia vote network (contained into the SNAP collection of network datasets). This is a directed graph, represented as a edge list.

To read and store the graph data:

%Read the data file.
G = importdata('Wiki-Vote.txt', '   ', 4); 

%G is a structure that contains:
% - data: a <num_of_edges,2> matrix filled with node (wiki users) ids
% - textdata: a cell matrix that contains the header strings (first 4
%   lines).
% - colheaders: a cell matrix that contains the last descriptive string
%   (fourth line).
%All the useful information is contained into data matrix.

%Split directed edge list into 'from' and 'to' nodes lists.
Nfrom = G.data(:,1); %Will be used to compute out-degree
Nto = G.data(:,2);   % "..." in-degree

Motivated by this question, I followed this way to compute the out-degree

%Remove duplicate entries from Nfrom and Nto lists.
Nfrom = unique(Nfrom); %Will be used to compute the outdegree distribution.
Nto = unique(Nto);     %Will be used to compute the indegree distribution.

%Out-degree: count the number of occurances of each element (node-user id)
%contained into Nfrom to G.data(:,1).
outdegNsG = histc(G.data(:,1), Nfrom);
odG = hist(outdegNsG, 1:size(Nfrom));

figure;
plot(odG)
title('linear-linear scale plot: outdegree distribution');
figure;
loglog(odG)
title('log-log scale plot: outdegree distribution');

Same things to do for computing the in-degree. But the linear plot I take is far than satisfying and made me wondering if my approach is not the correct one.

In linear scale:

enter image description here

In log-log scale:

enter image description here

A zoom into distribution's graph in linear scale makes it clear that is close to a power law:

enter image description here

My question is if my approach to compute the degree distribution is the correct one, as I have not any help to ensure this. Specifically, I want to know if a smaller number of bins in histc will give a more clear graph without losing any valueable info.

Was it helpful?

Solution

Okay... My previous approach would be correct if I wanted to plot the out- (or in-) degree of each node, not the degree distribution...

For out-degree distribution:

Nfrom = G.data(:,1); %Will be used to compute out-degree
Nfrom = unique(Nfrom); %Will be used to compute the outdegree distribution.
outdegNsG = histc(G.data(:,1), Nfrom);
outdd = histc(outdegNsG, unique(outdegNsG));

so, I should plot:

loglog(1:length(outdd),outdd);

Same for indegree...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top