PAC learning vs. learning on uniform distribution

https://cs.stackexchange.com/questions/124329

29-09-2020
|

Question

The class of function $\mathcal{F}$ is PAC-learnable if there exists an algorithm $A$ such that for any distribution $D$, any unknown function $f$ and any $\epsilon, \delta$ it holds that there exists $m$ such that on an input of $m$ i.i.d samples $(x, f(x))$ where $x\sim D$, $A$ returns, with probability larger than $1-\delta$, a function which is $\epsilon$-close to $f$ (with respect to $D$). The class $\mathcal{F}$ is efficiently PAC learnable if it is PAC learnable, and $A$ runs in time $\text{poly}(1/\epsilon, 1/\delta, n)$ (where $|x| = n$) and the description of $f$.

Is there a case where a class $\mathcal{F}$ is not efficiently PAC learnable, yet it is efficiently learnable on the uniform distribution?

Solution

Is there a case where a class $\mathcal{F}$ is not efficiently PAC learnable, yet it is efficiently learnable on the uniform distribution?

This has been asked on TCS.SE. It looks like the short answer is yes -- Aaron Roth gives the example of width-$k$ conjunctions for $k \gg \log n$. And in the comments, Avrim Blum is quoted as giving the answer of $2$-term DNF.

I don't totally understand the examples there -- it must take a bit more work to really derive the result (I will update this answer if I find a self-contained proof). But the general problem here with the uniform distribution is that if the target function $f$ is "sparse", meaning it labels at most a $\delta$ fraction of the input distribution with $1$, then it is easy to learn by simply guessing $0$.

Beyond just efficient (polynomial) PAC learning, there are other examples where the uniform distribution seems to help. For example, a classic hard problem for PAC learning is DNF, a problem which is conjectured to be not PAC learnable, and quite hard. On the other hand DNF over the uniform distribution is almost PAC learnable: we have a qausipolynomial time algorithm for it. See these notes and these notes.

In summary, there appears to be a lot of work on learning under the uniform distribution (search "PAC learning under uniform distribution" for additional references). And it seems that this problem is easier than studying PAC learning, where we aren't able to prove much. Quoting Ryan O'Donnell's notes above:

General PAC-learning insists that one algorithm works simultaneously for all distributions. In fact, the machine learning community interested in real-world applications finds the uniform setting questionable: "They’ll punch you in the nose if you try to tell them about algorithms in this framework." --Ryan. But, in a more general framework, no one can really prove anything.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange