Knowing Joint probability distribution between feature-label space

https://datascience.stackexchange.com/questions/66613

20-10-2020
|

質問

I am doing a course CORNELL CS4780 "Machine Learning for Intelligent Systems". you can find the link here for the one I am going to refer 1st lecture

The professor explains, we have a sample

$D ={ (X_1,y_1),(X_2,y_2), \ldots,(X_n,y_n)} \sim P$ Where, (Xi,yi) is a feature-label pair. There is a joint distribution over the feature-label space and is denoted by $P$.

We never have access to the $P$, Only God knows $P$. What we want to do in this supervised learning task is to take data from this distribution and learn a mapping/function form $X$ to $y$.

I agree/understand till this point.

Then, Professor goes on to make a statement in the lecture, precisely at 34 minutes 26 seconds, that

"IF we had access to this distribution, everything would be easy". But he doesnt explain this statement.

Now my question is What would have been easy if we knew about the distribution ? Does he mean, if we had access to the distribution then we would know the probabilities of each of $(X_i,Y_i)$ pair. Then we can learn a mapping/parameters such that we reduce out of sample error?

解決

The interest is to predict $y$.

If we know the real distribution $P$ of $(x,y)$, there is no need to build any more machine learning model. Given $x$, we can directly consult $P$ to know the probability of $P(y|x)$. For the case of discrete $Y$, we have $P(y|x)=\frac{P(x,y)}{\sum_z P(x,z)}$.

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange