In a neural network, it is common to compute a dot product of the form

$$\langle w, x \rangle = w_1 x_1 + w_2 x_2 + \ldots + w_n x_n$$

and use it as argument to some activation function. This is done for several different weights $w$ and activation functions (one per neuron). The results are used as input to the next layer, where we compute again several dot products and use them as argument to more activation functions.

Using many layers leads do several theoretical difficulties of deep learning and one has to make some effort to make everything works. But at the end, hopefully it will end up working.

I was thinking in a way of "cheating" the use of many layers. The idea is to use just one layer in a way that the argument to the activation function is complex enough so the program will learn properly. Instead of a dot product I was thinking in using products of dot products:

$$\prod_{k=1}^L\langle w^{(i)}, x \rangle = (w_1^{(1)} x_1 + w_2^{(1)} x_2 + \ldots + w_n^{(1)} x_n) \cdot \ldots \cdot (w_1^{(L)} x_1 + w_2^{(L)} x_2 + \ldots + w_n^{(L)} x_n)$$

The result is them used as argument to some activation function. Since this argument already has several weights, my hope is that the need of several layers of activation functions wouldn't be necessary anymore. I even though about using sum of these products and tested it. I'm getting $\sim 80\%$ of accuracy in a classification problem where I know it is possible to get more than $90\%$. Does my approach has a fundamental limitation I'm not aware of?

Thanks.

Extra: Above I gave a simplified description of my idea (which I think is enough), but it may be helpful to put the whole thing here. My final model of learning is given by the following formula

$$ = \textbf{f}\left( \sum_{r=1}^R \left( \prod_{i=1}^{L-1} \langle \textbf{w}^{(i,r)}, \textbf{x} \rangle \right) \left[ \begin{array}{c} w_1^{(L,r)} \\ \vdots\\ w_m^{(L,r)} \\ \end{array} \right] \right) = $$

$$ = \left[ \begin{array}{c} f\left( \sum_{r=1}^R \sum_{i_1, \ldots, i_{L-1}=1}^n w_{i_1}^{(1,r)} \ldots w_{i_{L-1}}^{(L-1,r)} x_{i_1} \ldots x_{i_{L-1}} w_1^{(L,r)} \right) \\ \vdots\\ f\left( \sum_{r=1}^R \sum_{i_1, \ldots, i_{L-1}=1}^n w_{i_1}^{(1,r)} \ldots w_{i_{L-1}}^{(L-1,r)} x_{i_1} \ldots x_{i_{\ell-1}} w_m^{(L,r)} \right) \\ \end{array} \right]$$

$f:\mathbb{R} \to \mathbb{R}$ is the activation function and $\textbf{f} = (f, \ldots, f)$ is the "vectorized" version of this function. Note that I tried to come up with a very flexible formulation before the activation function. The idea is that this would avoid the need of more layers. I hope everything is clear at this point. Thanks again.

没有正确的解决方案

许可以下: CC-BY-SA归因
scroll top