Question

Consider $n$ sets, $X_i$, each having $n$ elements or fewer, drawn among a set of at most $m \gt n$ elements. In other words $$\forall i \in [1 \ldots n],~|X_i| \le n~\wedge~\left|\bigcup_{i=1}^n X_i\right| \le m$$

Consider the complete graph $G$ formed by taking every $X_i$ as a node, and weighing every edge $(i,j)$ by the cardinal of the symmetric difference $X_i \triangle X_j$.

An immediate bound on the weight of the minimal spanning tree is $\mathcal{O}(n^2)$, since each edge is at most $2 n$, but can we refine this to $\mathcal{O}(m)$?

For illustration, consider $2 p$ sets, $p$ of which contain the integers between $1$ and $p$ and $p$ of which contain the integers of between $p+1$ and $2p$. A minimal spanning tree has weight $p$ but a poorly chose tree on this graph would have weight $(p-1)p$. Intuitively, if there are only $m$ values to chose from, the sets can't all be that different from one another.

EDIT: Contributor Dmitry gives a nice counterexample below in which $m$ is nearly but not quite $n^2$.

A counterexample or proof would still be of interest in the case where $m = \mathcal{O}(k n)$. Can the weight of spanning-tree be bound by $\mathcal{O}(f(k) n)$? By $\mathcal{O}(f(k) n \log^c n)$?

Was it helpful?

Solution

Interesting question.

The right intuition should probably be along the guideline that two random subsets of cardinality $n$ drawn from some $cn$ elements for some constant $c$ differ from each other significantly with a probability very close to 1 and, hence, the weight of the minimum spanning tree of the graph $G$ should be $\mathcal\Theta(n^2)$ on average. I cannot prove that guideline is correct, however.

Instead, I will present one series of such examples. More specifically, from some $n$ (that can be arbitrary large), there are $n$ sets, each having $(n-1)/2$ elements drawn from a set of $n$ elements, such that the cardinality of the symmetric difference between any two sets is no less than $(n-1)/2$. So the weight of the minimum spanning tree is no less than $(n-1)^2/2=\mathcal\Theta(n^2)$.


Here is the construction, using quadratic residue.

Example. Let $n=p$ be an odd prime. Let $X_0$ be the set of all non-zero quadratic residues of $p$ between 0 and $p-1$ inclusive. In other words, $$X_0=\{0\le k\lt p: \left(\frac {k}p\right)=1\}$$ where $\left(\frac{\cdot}p\right)$ is the Legendre symbol. For $0\le i\lt p$, let $X_i$ be "$X_0$ plus $i$", i.e., $$X_i=\{0\le k\lt p: \left(\frac {k-i}p\right)=1\}.$$ Then $|X_i|=\frac{p-1}2$ for all $i$ and $|X_i \triangle X_j|\ge \frac{p-1}2$ for all $i\not=j$.

Proof: Since $\left(\frac{\cdot}p\right)$ is either $-1$, $0$, or $1$, we have $1+\left(\frac{\cdot}p\right)\ge0$. Hence, $$\begin{aligned} &\quad\quad \sum_{0\le k\lt p}\left(1+\left(\frac {k-i}p\right)\right)\left(1+\left(\frac {k-j}p\right)\right)\\ &\ge\sum_{0\le k\lt p\,\land\,\left(\frac {k-i}p\right)=1\,\land\, \left(\frac {k-j}p\right)=1}\left(1+\left(\frac {k-i}p\right)\right)\left(1+\left(\frac {k-j}p\right)\right)\\ &=\sum_{0\le k\lt p\,\land\,\left(\frac {k-i}p\right)=1\,\land\, \left(\frac {k-j}p\right)=1}4\\ &=4\,|X_i\cap X_j| \end{aligned}$$ On the other hand, we have $$\begin{aligned} &\quad\quad \sum_{0\le k\lt p}\left(1+\left(\frac {k-i}p\right)\right)\left(1+\left(\frac {k-j}p\right)\right)\\ &=\sum_{0\le k\lt p}\left(1 + \left(\frac {k-i}p\right) + \left(\frac {k-j}p\right)+ \left(\frac {k-i}p\right)\left(\frac {k-j}p\right)\right)\\ &=p + 0 + 0 + \sum_{0\le k\lt p} \frac {k^2-(i+j)k+ij}p\\ &=p-1 \end{aligned}$$ Since $p\nmid(-(i+j))^2-4ij=(i-j)^2$, the last equality above comes from the case of $p\nmid b^2-4ac$, theorem 1 in the paper On Certain Sums with Quadratic Expressions Involving the Legendre Symbol. So we have $|X_i\cap X_j|\le \frac{p-1}4.$

Since $|X_i|=|X_j|=\frac{p-1}2$, $\ |X_i \triangle X_j|=|X_i|+|X_j|-2|X_i\cap X_j|\ge \frac{p-1}2.$ $\quad\checkmark$


For people who appreciate concrete examples, here are the sets when $n=17$, where $|X_i \triangle X_j|\ge 8$. $$\begin{aligned} X_{0}&=\{\phantom{1}1, \phantom{1}2, \phantom{1}4, \phantom{1}8, \phantom{1}9, 13, 15, 16 \}\\ X_{1}&=\{\phantom{1}2, \phantom{1}3, \phantom{1}5, \phantom{1}9, 10, 14, 16, \phantom{1}0 \}\\ X_{2}&=\{\phantom{1}3, \phantom{1}4, \phantom{1}6, 10, 11, 15, \phantom{1}0, \phantom{1}1 \}\\ X_{3}&=\{\phantom{1}4, \phantom{1}5, \phantom{1}7, 11, 12, 16, \phantom{1}1, \phantom{1}2 \}\\ X_{4}&=\{\phantom{1}5, \phantom{1}6, \phantom{1}8, 12, 13, \phantom{1}0, \phantom{1}2, \phantom{1}3 \}\\ X_{5}&=\{\phantom{1}6, \phantom{1}7, \phantom{1}9, 13, 14, \phantom{1}1, \phantom{1}3, \phantom{1}4 \}\\ X_{6}&=\{\phantom{1}7, \phantom{1}8, 10, 14, 15, \phantom{1}2, \phantom{1}4, \phantom{1}5 \}\\ X_{7}&=\{\phantom{1}8, \phantom{1}9, 11, 15, 16, \phantom{1}3, \phantom{1}5, \phantom{1}6 \}\\ X_{8}&=\{\phantom{1}9, 10, 12, 16, \phantom{1}0, \phantom{1}4, \phantom{1}6, \phantom{1}7 \}\\ X_{9}&=\{10, 11, 13, \phantom{1}0, \phantom{1}1, \phantom{1}5, \phantom{1}7, \phantom{1}8 \}\\ X_{10}&=\{11, 12, 14, \phantom{1}1, \phantom{1}2, \phantom{1}6, \phantom{1}8, \phantom{1}9 \}\\ X_{11}&=\{12, 13, 15, \phantom{1}2, \phantom{1}3, \phantom{1}7, \phantom{1}9, 10 \}\\ X_{12}&=\{13, 14, 16, \phantom{1}3, \phantom{1}4, \phantom{1}8, 10, 11 \}\\ X_{13}&=\{14, 15, \phantom{1}0, \phantom{1}4, \phantom{1}5, \phantom{1}9, 11, 12 \}\\ X_{14}&=\{15, 16, \phantom{1}1, \phantom{1}5, \phantom{1}6, 10, 12, 13 \}\\ X_{15}&=\{16, \phantom{1}0, \phantom{1}2, \phantom{1}6, \phantom{1}7, 11, 13, 14 \}\\ X_{16}&=\{\phantom{1}0, \phantom{1}1, \phantom{1}3, \phantom{1}7, \phantom{1}8, 12, 14, 15 \}\\ \end{aligned}$$

OTHER TIPS

You can't. Consider the following sets for some $k$, with $m=k^2$ (they both are powers of $2$):

  • $\{1..k\}$, $\{k+1..2k\}$, $\ldots$, $\{m-k+1..m\}$
  • $\{1, 3, 5, \ldots, 2k-1\}$, $\{2, 4, 6, \ldots, 2k\}$, $\{2k+1, 2k+3, \ldots, 4k - 1\}$, $\{2k+2, 2k+4, \ldots, 4k\}$, $\ldots$
  • $\{1, 5, 9, \ldots, 4k - 3\}$, $\{2, 6, 10, \ldots, 4k-2\}$ $\ldots$.

Each symmetric difference is at least $\frac k2$. Each level has $\frac mk$ sets, and there are $1 + \log \frac mk$ levels. Therefore, there are $\frac mk(1 + \log \frac mk)$ sets. Since each set must have cardinality at most the number of sets, we must have $k \le \frac mk (1 + \log \frac mk)$, and it's satisfied when $m = k^2$.

The size of the minimum spanning tree is at least $\frac k 2 \cdot \frac mk (1 + \log \frac mk) = \Omega(m \log m)$.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top