Notation: m = number of edges in the old graph
- Sort the edge list
E
(version two of your proposed formats) by users (takes O(m log(m)) - Go through
E
and determine all the runs of consecutive edges with same user (O(m)) - Each pair of edges within a run gives you an edge in the new graph -> Add it to the edge list
F
of the new graph (O(|F| = n_user x max_n_items_per_user^2)) - Contract all duplicates in
F
into single edges with the edge weights given by the number of duplicates (O(|F|))
If your graph is sparse, i.e. max_n_items_per_user is small, the above should be fairly efficient. Otherwise, this algorithm has the problem that it enumerates all edges in the new graph before contracting them, so one should think about how to get the edge weight directly.