Recent neural network models are getting bigger to increase the performance to the limit. This paper suggests MobileNet to reduce the size of neural network small enough to deploy on mobile devices.
Several techniques are used for MobileNet.
The most important component of MobileNet is depthwise separable convolution. Assume a feature map of
D_F\cdot D_F \cdot M. Standard convolution filters consists of N number of filters of size
D_K\cdot D_K \cdot M. Instead, depthwise separable convolution replace this with M number of depthwise convolution of size
D_f\cdot D_f\cdot 1, and N number of pointwise convolution of size
1\cdot 1 \cdot M.
New efficient architecture based on this method not only reduce the number of multi-add, but also concentrate the computation on pointwise convolution layer which is one of the most efficient operation by general matrix multiply(GEMM).
Mobilenet introduced two additional hyperparameters to reduce the computation. Width multiplier
\alpha is used to reduce the number of channels on each layer. Resolution multiplier
\rho is used to reduce the height and width on each layer. The number of computation is reduced from
D_K \cdot D_K \cdot M \cdot N \cdot D_F \cdot D_F
D_k \cdot D_k \cdot \alpha M \cdot\rho D_F \cdot\rho D_F + \alpha M \cdot\alpha N \cdot\rho D_F \rho D_F
image MobileNet decreased the number of parameters and computations dramatically with slight decrease in performance on various tasks including classification and detection.
It is amazing that the convolution filters can be represented with depthwise convolution and pointwise convolution while preserving much of its representational power. Could there be similar method of RNN series?