Learning to Reason: End-to-End Module Networks for Visual Question Answering

12 Feb 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Former neural module network for VQA depends on a naive semantic parser to unroll the layout of the network. This paper suggests End-to-End Module Networks(N2NMN) to directly learn the layout from the data.

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

11 Feb 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

This paper describe several tips and tricks for VQA challenge with the first place model in 2017 VQA challenge. Also, this paper conducts comprehensive experiment for ablation of each trick.

Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

08 Feb 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

AQM solves visual dialogue tasks with information theoratic approach. However, the information gain by each candidate question needs to be calculated explicitly which leads to lack of scalability. This paper suggests AQM+ to solve large-scale problem.

Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

31 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Goal-oriented dialogue tasks require two agents(a questioner and an answerer) to communicate to solve the task. Previous supervised learning or reinforcement learning approaches struggled to make appropriate question due to the complexity of forming a sentence. This paper suggests information theoretic approach to solve this task.

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

26 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Former methods used element-wise sum, product or concatenation to represent the relation of two vectors. Bilinear model(outer prodct) of two vectors is more sophisticated way of representing relation, but usually dimensionality become too big. This paper suggests multimodal compact bilinear pooling(MCB) to represent compact and sophisticated relations.

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

In image captioning or visual question answering, the features of an image are extracted by the spatial output layer of pretrained CNN model.

Compositional Attention Networks for Machine Reasoning

24 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Previous methods for visual reasoning lacked interpretability. This paper suggests MAC network which is fully differentiable and interpretable attention based visual reasoning model.

Hierarchical Question-Image Co-Attention for Visual Question Answering

23 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Previous works achieved successful results in VQA by modeling visual attention. This paper suggests co-attention model for VQA to pay attention to both images (where to look) and words (what words to listen to).

WHY?

WHY?

WHY?

WHY?

WHY?

WHY?

WHY?

WHY?

Pagination