WHY?

Recent neural network models are getting bigger to increase the performance to the limit. This paper suggests MobileNet to reduce the size of neural network small enough to deploy on mobile devices.

WHAT?

Several techniques are used for MobileNet.

image

The most important component of MobileNet is depthwise separable convolution. Assume a feature map of $D_F\cdot D_F \cdot M$. Standard convolution filters consists of N number of filters of size $D_K\cdot D_K \cdot M$. Instead, depthwise separable convolution replace this with M number of depthwise convolution of size $D_f\cdot D_f\cdot 1$, and N number of pointwise convolution of size $1\cdot 1 \cdot M$.

image

New efficient architecture based on this method not only reduce the number of multi-add, but also concentrate the computation on pointwise convolution layer which is one of the most efficient operation by general matrix multiply(GEMM).

Mobilenet introduced two additional hyperparameters to reduce the computation. Width multiplier $\alpha$ is used to reduce the number of channels on each layer. Resolution multiplier $\rho$ is used to reduce the height and width on each layer. The number of computation is reduced from

to

So?

image MobileNet decreased the number of parameters and computations dramatically with slight decrease in performance on various tasks including classification and detection.

Critic

It is amazing that the convolution filters can be represented with depthwise convolution and pointwise convolution while preserving much of its representational power. Could there be similar method of RNN series?

Howard, Andrew G., et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).