In supervised learning, what does “Estimating $p(y \vert x)$” mean?

https://datascience.stackexchange.com/questions/14953

16-10-2019
|

문제

I read chapter 5.1.3 of Joshua Bengio's deeplearning book, which says: supervised learning involve observing examples of random vectors $\textbf{x}$ and associated value or vector $\textbf{y}$ and learning to predict $\textbf{y}$ from $\textbf{x}$ by estimating $p(\textbf{y} \vert \textbf{x})$.

What does $p(\textbf{y} \vert \textbf{x})$ mean?

From basic statistics, I know that $p(\textbf{y} \vert \textbf{x})=\frac{p(\textbf{y},\textbf{x})}{p(\textbf{x})}$. How do we find $p(\textbf{y},\textbf{x})$ and $p(\textbf{x})$?

해결책

You are correct that \begin{equation*} p(y|\mathbf{x})=\dfrac{p(\mathbf{x},y)}{p(\mathbf{x})}. \end{equation*} Similarly, we can write the joint probability $p(\mathbf{x},y)$ as follows: \begin{equation*} p(\mathbf{x},y)=p(\mathbf{x}|y)\cdot p(y) \end{equation*} From the above two equations, we obtain \begin{equation*} p(y|\mathbf{x})=\dfrac{p(\mathbf{x}|y)\cdot p(y)}{p(\mathbf{x})} \end{equation*} In the context of supervised learning, the variable $y$ is used to denote the class labels, and the vector $\mathbf{x}$ for measurement vector or feature vector. For the purpose of discussion, let us assume that the class label $y$ takes values in the set $\{1,2\}$, where $1$ denotes male and $2$ denotes female. Similarly, $\mathbf{x}$ is a measurement vector on two variables say, $(x_{1},x_{2})$, where $x_{1}$ stands for height and $x_{2}$ stands for weight of individuals.

$p(y|\mathbf{x})$ denote the posterior density for $y$ given the observation $\mathbf{x}$. For example, $p(1|\mathbf{x})$ means that given the observation $\mathbf{x}$, what is the probability that sample belongs to the class males. Similarly, we can interpret $p(2|\mathbf{x})$. $p(\mathbf{x}|y)$ stands for the class conditional probability density. For example, $p(\mathbf{x}|1)$ denote the probability density for males and $p(\mathbf{x}|2)$ denote the probability density for females respectively. In supervised learning, these class conditional densities are usually known in advance. Finally $p(y)$ denote the prior probability for the class label $y$. For example, $p(1)$ denotes the probability that an individual/example selected randomly from the population is a male.

Assuming that, the prior probabilities for the classes and class conditional densities are known, Bayes rule says, assign an observation $x_{0}$ whose class label is not known, to the class, say, male, if \begin{equation*} p(\mathbf{x_{0}}|1)p(1)>p(\mathbf{x_{0}}|2)p(2). \end{equation*} Note that, $p(\mathbf{x})$ is not used in the classifier. Classification rules usually require class conditional densities.

다른 팁

$P(y|x)$ stands for the probability of y occurring, given x has occurred.

Using bayes theorem, this can also be written as:

$P(y|x) = \frac {{P(x|y) \times P(y)}}{ P(x)}$

It is these values that you generally know and can use.

Say $x$ is sex and $y$ is employed. You can restate your $p(y|x)$ as:

$$ p(employed=1~|~sex=male) $$

Now, it's a simple matter of counting the relevant examples or looking up in the appropriate contingency table.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange