Вопрос

When we talk about Perceptrons, we say that they are limited for approximating functions that are linearly separable, while Neural Networks that use non-linear transformations are not.

I am having trouble understanding this idea of linear separability. Specifically, does it apply only for binary classification or it generalizes for N classes? If so, how does a linear decision boundary for lets say 4 classes even look like?

Это было полезно?

Решение

1.) Perceptron is a non-linear transformation!

2.) Linear seperable function is only defined for boolean functions, see Wikipedia. Therefore, yes, the statement is meant only for binary classification.

3.) For general functions, see the universal approximation theorem.

Другие советы

I will try to explain this problem in a classification scenario. Say we begin with two classes of images say dog and cat. And we are asked to classify the images whether the given test image is dog or cat. Imagine the images that we are given is $8 \times 8$. So each image is represented by an $8 \times 8=64$ dimensional vector. For the training dataset, one such training image will form a data point in this 64-dimensional space. After lots of training data of cat and dog images say 1000 training cat images and 1000 training dog images, we get 2000 such points in the 64-dimensional space. If in that space, the points can be classified using a 64-dimensional hyperplane, then it's linearly separable in an easy way. If not, we can use kernel which is a nonlinear function of the given image, which will transfer the 64-dimensional space into a higher space (more than 64, where we can find linear separability. In a word, each problem is linearly separable. For more information, you can go through Linear Support Vector Machine and Kernel Support Vector Machines.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с datascience.stackexchange
scroll top