سؤال

I have read that the degree of nodes in a "knowledge" graph of people roughly follows a power law distribution, and more exactly can be approximated with a Pareto-Lognormal distribution.

Where can I find a kind of algorithm that will produce a random graph with this distribution?

See for example the paper Revisiting Degree Distribution Models for Social Graph Analysis (page 4, equation 1) for a mathematical description (distribution function) of the kind of distribution I'm interested in.

هل كانت مفيدة؟

المحلول

To generate a graph from a prescribed degree sequence (regardless of the degree sequence), one can use an algorithm by Blitzstein and Diaconis:

Unfortunately their algorithm is potentially $O(n^2 \bar{d})$, for $n$ vertices with average degree $\bar{d}$, so this might not be fast enough depending on the size of your graph and the application involved.

Molloy and Reed introduced the "Erased Configuration Model" which pairs vertices based on a degree sequence chosen proportional to the remaining free slots.

A paper by Britton, Deijfen and Martin-Lof show that the Erased Configuration Model asymptotically approaches the prescribed degree distribution chosen. While not guaranteeing the exact degree sequence input, I would recommend the erased configuration model for its simplicity and speed.

Briefly, the erased configuration model graph generation algorithm creates "hooks" for each vertex, where the number of hooks for each vertex is initially given by the degree sequence. Going from the vertex with the most hooks first, another hook is chosen uniformly at random from the pool still unattached. A connection is made for this pairing. Once the pool of hooks is exhausted, self loops are removed and multi-edges are collapsed into one.

Since self loops are allowed and multi-edges collapsed, the resulting degree sequence could potentially be different than the input sequence, so this method is not exact. If exactness is preferred, the algorithm by Blitzstein and Diaconus can be used at a higher computational cost. Note that the algorithm of Blitzstein and Diaconus checks whether a prescribed degree sequence is feasible (or "graphical" in their parlance), so degree sequences might need to be regenerated should they not be realizable.

Degree sequences can be chosen under any distribution you want. In your case, you would want to choose them based on a Pareto-Lognormal distribution.

I would also like to briefly note that generating graphs with prescribed degree sequences that follow a particular distribution, the Pareto-Lognormal distribution say, might not generate the type of "knowledge" graphs you want. The degree distribution is just a profile of a graph and says nothing about other features which might be present in the original graph ensemble you would be trying to model. As Raphael points out in the comments, Wikipedia's page on scale-free networks is a good starting point for some review.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى cs.stackexchange
scroll top