Question

I have searched online, but am still not satisfied with answers like this and this.

My intuition is that fully connected layers are completely linear. That means no matter how many FC layers are used, the expressiveness is always limited to linear combinations of previous layer. But mathematically, one FC layer should already be able to learn the weights to produce the exactly the same behavior. Then why do we need more? Did I miss something here?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top