WHY?

Former CNN models fully activate(filly distributed features) for a single input showing poor performance on invariant relational reasoning.

WHAT?

image

The reason former CNN models are poor at invariant reasoning is interference problem that learning each pattern interfere with each other while a model try to learn many patterns. To minimize this problem, this paper suggest mixture of experts architecture(MoE). MoE architecture consists of two parts: Individual expert networks() and Gater network(G) that weight the output of each expert network.

In this paper, each expert network consists of multiple residual blocks and the architecture is called Residual Mixture Network(ResMixNet).

So?

This paper suggested two tasks to test invariant relational reasoning. First is MNIST Parity dataset which is to predict the parity(both even or both odd) of a pair of numbers in a image. Second is colorized Pentomino dataset which is to learn whether all the Pentomino sprites in an image belong to the same class. ResMixNet achieved the best result on both of these tasks compared to traditional CNN models.

Critic

Modularity Seems quite naive. There should be further development.

Jo, Jason, Vikas Verma, and Yoshua Bengio. “Modularity Matters: Learning Invariant Relational Reasoning Tasks.” arXiv preprint arXiv:1806.06765 (2018).