Streaming algorithm for counting triangles in a graph

https://cs.stackexchange.com/questions/127839

29-09-2020
|

Question

As described in the reference, the algorithm (see at the bottom) supposes to output an estimator $\hat T$ for the # of triangles in a given graph $G = (V, E)$, denoted $T$. It is written that "it can easily be shown" that $$ E[\hat T] = T $$ But unfortunately, I'm not seeing it. Trying to analyze $E[\hat T]$, I think as follows:

At line 1, denote the probability to randomly (and uniformly) choose an edge which is part of a triangle as $p$. Since triangles can share edges, $$ \frac T m \le p \le \frac {3T} m $$ For example, consider the following case:

The central triangle doesn't add new edges to the # of possibilities to choose an edge which is part of a triangle. You can imagine a different configuration, in which there are only the 3 outer triangles and they don't touch each other (in this configuration, we won't see the central 4th triangle). In both cases ((case i) 4 triangles as seen in the image; (case ii) 3 disjoint triangles), the probability to choose an edge which is part of a triangle is 1 (although the # of triangles is different).
At line 2, the probability to choose uniformly at random a vertex which "closes a triangle" with the edge from the previous step is exactly $\frac 1 {n-2}$.

Therefore I only see that

$$ T \le E[\hat T] \le 3T $$

What am I missing?

Another question I have is regarding line 3. The stream is ordered, and we first pick a random edge $(u, v)$ (line 1), then a random vertex $w$ from $V \backslash \{u, v\}$ (line 2). I feel that the analysis should take into account that at line 3 we check whether $(u, w)$ and $(v, w)$ appear after $(u, v)$ in the stream. Maybe after I'll understand the answer to my first question, it will be clearer.

Algorithm:

Pick an edge $(u, v)$ uniformly at random from the stream.
Pick a vertex $w$ uniformly at random from $V \backslash \{u, v\}$
If $(u, w)$ and $(v, w)$ appear after $(u, v)$ in the stream, then output $m(n-2)$. Else, output $0$.

Also, although I didn't see it written, I believe there's an assumption that $V$ is known ahead.

Reference: Data streams lecture notes by Prof. Amit Chakrabarti, section "15.3 Triangle Counting", https://www.cs.dartmouth.edu/~ac/Teach/data-streams-lecnotes.pdf

Best regards

Solution

Let $(u,v,w)$ be a particular triangle in the stream, and suppose that the edge $(u,v)$ appears first. The probability that we chose $(u,v)$ in the first step is $1/m$. The probability that we chose $w$ in the second step is $1/(n-2)$. Hence the probability that we chose the triangle $(u,v,w)$ is $1/[m(n-2)]$. Let us denote this event by $E_{u,v,w}$.

If $(u_1,v_1,w_1)$ and $(u_2,v_2,w_2)$ are two different triangles that the events $E_{u_1,v_1,w_1}$ and $E_{u_2,v_2,w_2}$ are disjoint (note that the triangles don't have to be disjoint). Therefore if there are $T$ triangles, then the probability that we chose one of them is exactly $T/[m(n-2)]$. Therefore the expected output of the algorithm is exactly $T$.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange