Proof of uniform convergence if VC dimension is finite

https://cs.stackexchange.com/questions/123440

29-09-2020
|

Question

In the book »Understanding Machine Learning: From Theory to Algorithms«, written by Ben-David and Shalev-Shwartz, there is a proof which I do not understand. The context is proving that if a hypothesis class $\mathcal{H}$ has finite VC dimension, then it enjoys the uniform convergence property.

The proof of this implication involves Sauer's Lemma (hypothesis classes with finite VC dimension have »small effective size«) and a theorem stating that hypothesis classes with »small effective size« enjoy the uniform convergence property. The proof of the latter sub-implication involves this theorem (Theorem 6.11 in the book):

Let $\mathcal{H}$ be a hypethesis class of binary classifiers over a domain $\mathcal{X}$ and let $\tau_{\mathcal{H}}$ be its growth function (from Sauer's Lemma). Then, for all $m \in \mathbb{N}$, for every distribution $\mathcal{D}$ over $\mathcal{X} \times \{0, 1\}$ and every $\delta \in (0, 1)$: $$ \forall h \in \mathcal{H}: \mathbb{P}_{S \sim \mathcal{D}^m}\left[ | L_{\mathcal{D}}(h) - L_S(h) | \leq \frac{4 + \sqrt{\log(\tau_{\mathcal{H}}(2m))}}{\delta\sqrt{2m}} \right] \geq 1 - \delta $$

In the proof for this theorem it is stated, that it follows immediately from the inequality $$ \mathbb{E}_{S \sim \mathcal{D}^m}\left[ \sup_{h \in \mathcal{H}} | L_{\mathcal{D}}(h) - L_S(h) | \right] \leq \frac{4 + \sqrt{\log(\tau_{\mathcal{H}}(2m))}}{\sqrt{2m}}, $$ the fact that the random variable $\theta := \sup_{h \in \mathcal{H}} | L_{\mathcal{D}}(h) - L_S(h) |$ is nonnegative and Markov's inequality.

However, I do not see how it follows immediately from those facts. What am I missing?

Solution

Let $X = |L_\mathcal{D}(h) - L_S(h)|$. The statement on the expectation of the supremum of $X$ implies, in particular, that for some $M$, $$ \mathbb{E}[X] \leq M. $$ Since $X \geq 0$, Markov's inequality implies that $$ \Pr\left[X \geq \frac{M}{\delta}\right] \leq \delta. $$ This implies that $$ \Pr\left[X \leq \frac{M}{\delta}\right] \geq \Pr\left[X < \frac{M}{\delta}\right] = 1 - \Pr\left[X \geq \frac{M}{\delta}\right] \geq 1 - \delta. $$

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange