GQA: A New Dataset for Real-World Visual Reasoning ans compositional Question Answering

WHY?

There were some problems in previous VQA dataset. Strong language prior, non-compositional language and variablility in language were key obstacles for model to learn proper concepts and logics from VQA dataset. Synthetically generated CLEVR dataset solved these problems to some extent but lacked realisticity by remaining in relatively simple domain.

Continue reading

Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

WHY?

Goal-oriented dialogue tasks require two agents(a questioner and an answerer) to communicate to solve the task. Previous supervised learning or reinforcement learning approaches struggled to make appropriate question due to the complexity of forming a sentence. This paper suggests information theoretic approach to solve this task.

Continue reading

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

WHY?

Former methods used element-wise sum, product or concatenation to represent the relation of two vectors. Bilinear model(outer prodct) of two vectors is more sophisticated way of representing relation, but usually dimensionality become too big. This paper suggests multimodal compact bilinear pooling(MCB) to represent compact and sophisticated relations.

Continue reading

Pagination


© 2017. by isme2n

Powered by aiden