dummy

不理解的地方比较多, 后续整理.

1 Introduction

Invariance means that you can recognize an object as an object, even when its appearance varies in some way.

Translation means that each point/pixel in the image has been moved the same amount in the same direction. Alternately, you can think of the origin as having been shifted an equal amount in the opposite direction.

2 Translation Picture

3 Invariance vs. Equivariance

Translation invariance means that the system produces exactly the same response, regardless of how its input is shifted.

Equivariance means that the system works equally well across positions, but its response shifts with the position of the target.

4 Why and how

dummy

Convolution + Max pooling \(\approx\)translation invariance

  • Convolution: provides equivariance to translation.

  • Pooling: provides the real translation invariance but only approximately.

dummy

All this happens because of weight sharing (visualize the kernels as weight matrices; certain submatrices of the weight matrix share the weights) in Convolutional Nets, which inherently allow this invariance.

dummy

After some thought, I do not believe that the pooling operation is the main reason for the translation invariant property in CNNs. I believe that invariance (at least to translation) is due to the convolution filters (not specifically the pooling) while the fully-connected layers at the end are “position-dependent”.

5 References

  1. https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features
  2. http://cs231n.github.io/convolutional-networks/
  3. https://stats.stackexchange.com/questions/208936/what-is-translation-invariance-in-computer-vision-and-convolutional-neural-netwo
  4. https://www.quora.com/Why-and-how-are-convolutional-neural-networks-translation-invariant