• SSD: Single Shot MultiBox Detector

    WHY? Object box proposal process is complicated and slow in object detection process. This paper proposes Single Shot Detector(SSD) to detect objects with single neural network. WHAT? SSD prodeces fixed-size collection of bounding boxes and scores the presence of class objects in the boxes. The front of SSD is standard...


  • Progressive Growing of GANs for improved Quality, Stability, and Variation

    WHY? Training GAN on high-resolution images is known to be difficult. WHAT? This paper suggests new method of training GAN to train progressively from coarse to fine scale. A pair of generator and discriminator are trained with low scale real and fake images at first. As input image size grows,...


  • BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding

    WHY? Former Transformer was unidirectional language model. WHAT? BERT is multi-layer bidirectional Transformer encoder. A sequence of input representation can be either a single text sentence or a pair of text sentences. The first token of every sequence is a classification token and sentences in a sequence are separated by...


  • On the Dimensionality of Word Embedding

    WHY? Dimension of word embedding is usually determined with heuristic. WHAT? This paper suggests new metric to evaluate the quality of word embedding and uses it to find its optimal dimensionality. Word embedding algorithms are shown to converge to implicitly factorized PMI matrix. However, L2 loss of embedding and factorized...


  • Inferring and Executing Programs for Visual Reasoning

    WHY? Neural modular networks do not generalize well to new questions since their performance rely on syntactic parser. WHAT? Instead of parsing questions into universal dependency representation, this paper used LSTM to generate a sequence of functions to form a program. Function modules are generic: unary module, binary module and...