Вопрос

I'm working with a very large undirected graph (a social network from a telecomunication company).

I'm applying a clustering algorithm on this graph to find it’s most relevant communities. The problem is that the algorithm is very slow and I need to test it with a smaller graph to tune some parameters.

Recently, I came with the idea of getting a sample from this large graph. This sample has to be representative of the original graph.

Is there someone who knows the best algorithm to get that sample? And what should be the minimum size of the sample (is it 15%?)?

I already read some articles about sampling on large directed graphs (Estimating and Sampling Graphs with Multidimensional Random Walks, and Sampling from Large Graphs), and there doesn't seem to be any article about undirected graphs.

If you need any more information, please tell me.

Thank you very much,

DC

Это было полезно?

Решение

You can treat the undirected graph like a directed graph for the purposes of sampling. Any sampling strategy for a directed graph should work assuming it allows cycles. You simply want to sample the nodes and edges, so any edges that become part of the sample just accept the edge in both directions.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top