• Deep Compositional Question Answering with Neural Module Networks

    WHY? Visual question answering task is compositional in nature. WHAT? This paper tries to solve VQA by composing modeules to construct a network architecture based on a given question. Primitive modules that can be composed into any configuration of questions are defined: attention, re-attention, combination, classification, and measurement. The key...


  • Tracking Emerges by Colorizing Videos

    WHY? Segmenting objects in videos is difficult without manual labels. WHAT? This paper suggests to learn segmenting the objects in video sequence with model trained with self-supervised colorizing task. Given reference frames and a grayscale input frame, the model tries to learn the color of the grayscale input frame from...


  • Latent Alignment and Variational Attention

    WHY? Even though attention is being widely used, it is hard to be considered as probabilistic model as the attention does not marginalize. WHAT? This paper formulated source separation task as getting mixture meight vector of multiple sources in wave form. Time-domain Audio Separation Network(TasNet) tries to find which is...


  • FiLM: Visual Reasoning with a General Conditioning Layer

    WHY? There are some architectures for relational reasoning but lacks general-purpose components for relational reasoning and visual question answering. WHAT? This paper propose Feature-wise Linear Modulation(FiLM) to conditionally focus on the image. By linearly transforming the output of convolution filter, FiLM conditionally choose certain filters. FiLM generator takes questions as...


  • TasNet: Time-Domain Audio Separation Network for Real-Time Single-channel Speech Separation

    WHY? Separating multiple sources of audio is difficult task. Previous works mostly made mask for each source in time-fequency domain. WHAT? This paper formulated source separation task as getting mixture meight vector of multiple sources in wave form. Time-domain Audio Separation Network(TasNet) tries to find which is relative contribution to...