Story Ending Generation with Incremental Encoding and Commonsense Knowledge

23 Sep 2019 in Studies on Deep Learning, Natural Language Processing, Knowledge Graph

WHY?

Commonsense knowledge graph can be useful source of explicit knowledge for generating texts that make sense. However, it is hard to use KG since it would hold huge amount of information than needed. Retrieving graphs which is relevent for the generation is the key.

Continue reading

Text Generation from Knowledge Graphs with Graph Transformers

19 Sep 2019 in Studies on Deep Learning, Natural Language Processing, Knowledge Graph

WHY?

Though knowledge graph can capture the essence of corpus, generating sentences based on the graph is difficult task. This paper tried to generate texts(paper abstracts) from KG in science(AI) domain.

Continue reading

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

16 Sep 2019 in Studies on Deep Learning, Natural Language Processing, Knowledge Graph

Commonsense Knowledge Graph

Knowledge graph is graph representation of knowledge. Entities are represented as nodes and relations between entities are represented as edges. Commonsense knowledge graph stores commonsense knowledge in form of graphs. Two of common dataset for commonsense knowledge graph are ATOMIC and ConceptNet.

Continue reading

GQA: A New Dataset for Real-World Visual Reasoning ans compositional Question Answering

20 May 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

There were some problems in previous VQA dataset. Strong language prior, non-compositional language and variablility in language were key obstacles for model to learn proper concepts and logics from VQA dataset. Synthetically generated CLEVR dataset solved these problems to some extent but lacked realisticity by remaining in relatively simple domain.

Continue reading

Generative Question Answering: Learning to Answer the Whole Question

02 Apr 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Discriminative question answering often overfit to datasets by catching any kinds of clue that leads to answer.

Continue reading

Visual Question Generation as Dual Task of Visual Question Answering

01 Apr 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Visual question answering and visual question generation are complementary tasks. Learning one task may benefit the other.

Continue reading

Task-Oriented Query Reformulation with Reinforcement Learning

28 Feb 2019 in Studies on Deep Learning, Natural Language Processing

WHY?

Information retrieval from search engine becomes difficult when the query is incomplete or too complex. This paper suggests a query reformulation system that rewrite the query to maximize the probability of relevant documents returned.

Continue reading

Learning to Reason: End-to-End Module Networks for Visual Question Answering

12 Feb 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Former neural module network for VQA depends on a naive semantic parser to unroll the layout of the network. This paper suggests End-to-End Module Networks(N2NMN) to directly learn the layout from the data.

Continue reading

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

11 Feb 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

This paper describe several tips and tricks for VQA challenge with the first place model in 2017 VQA challenge. Also, this paper conducts comprehensive experiment for ablation of each trick.

Continue reading

Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

08 Feb 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

AQM solves visual dialogue tasks with information theoratic approach. However, the information gain by each candidate question needs to be calculated explicitly which leads to lack of scalability. This paper suggests AQM+ to solve large-scale problem.

Continue reading

Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

31 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Goal-oriented dialogue tasks require two agents(a questioner and an answerer) to communicate to solve the task. Previous supervised learning or reinforcement learning approaches struggled to make appropriate question due to the complexity of forming a sentence. This paper suggests information theoretic approach to solve this task.

Continue reading

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

26 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Former methods used element-wise sum, product or concatenation to represent the relation of two vectors. Bilinear model(outer prodct) of two vectors is more sophisticated way of representing relation, but usually dimensionality become too big. This paper suggests multimodal compact bilinear pooling(MCB) to represent compact and sophisticated relations.

Continue reading

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

In image captioning or visual question answering, the features of an image are extracted by the spatial output layer of pretrained CNN model.

Continue reading

Compositional Attention Networks for Machine Reasoning

24 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Previous methods for visual reasoning lacked interpretability. This paper suggests MAC network which is fully differentiable and interpretable attention based visual reasoning model.

Continue reading

Hierarchical Question-Image Co-Attention for Visual Question Answering

23 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Previous works achieved successful results in VQA by modeling visual attention. This paper suggests co-attention model for VQA to pay attention to both images (where to look) and words (what words to listen to).

Continue reading

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

22 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

While bilinear model is an effective method for capturing the relationship between two spaces, often the number of parameters is intractable. This paper suggests to reduce the number of parameters by controlling the rank of the matrix with Turker decomposition.

Continue reading

Chain of Reasoning for Visual Question Answering

21 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

Previous methods for visual question answering performed one-step or static reasoning while some questions requires chain of reasonings.

Continue reading

Deformable Convolutional Networks

18 Jan 2019 in Studies on Deep Learning, Computer Vision

WHY?

Spatial sampling of convolutional neural network is geometrically fixed. This paper suggests two modules for CNN to capture the geometric structure more flexibly.

Continue reading

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

17 Jan 2019 in Studies on Deep Learning, Visual Question Answering

WHY?

A caption of an image can be generated with attention based model by aligning a word to a part of image.

Continue reading

MobileNet: Efficient Convolutional Neural Networks for Mobile Vision Applications

15 Jan 2019 in Studies on Deep Learning, Deep Learning

WHY?

Recent neural network models are getting bigger to increase the performance to the limit. This paper suggests MobileNet to reduce the size of neural network small enough to deploy on mobile devices.

Continue reading

MobileNet: Efficient Convolutional Neural Networks for Mobile Vision Applications

15 Jan 2019 in Studies on Deep Learning, Deep Learning

WHY?

Recent neural network models are getting bigger to increase the performance to the limit. This paper suggests MobileNet to reduce the size of neural network small enough to deploy on mobile devices.

Continue reading

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

15 Jan 2019 in Studies on Deep Learning, Generative Models

WHY?

Continue reading

U-net: Convolutional Networks for Biomedical Image Segmentation

14 Jan 2019 in Studies on Deep Learning, Computer Vision

WHY?

Image segmentation requires a lot of annotated images. This paper suggests efficient training of image segmentation using data augmentation and new structure.

Continue reading

A Style-Based Generator Architecture for Generative Adversarial Networks

11 Jan 2019 in Studies on Deep Learning, Generative Models

WHY?

High quality disentangled generation of images has been the goal for all the generative models. This paper suggests style-based generator architecture for GAN with techniques borrowed from the field of style transfer.

Continue reading

Neural Arithmetic Logit Units

10 Jan 2019 in Studies on Deep Learning, Deep Learning

WHY?

Neural network was poor at manipulating numerical information outside the range of training set.

Continue reading

Neural Arithmetic Logit Units

10 Jan 2019 in Studies on Deep Learning, Deep Learning

WHY?

Neural network was poor at manipulating numerical information outside the range of training set.

Continue reading

VAE with a VampPrior

09 Jan 2019 in Studies on Deep Learning, Generative Models

WHY?

Choosing an appropriate prior is important for VAE. This paper suggests two-layered VAE with flexible VampPrior.

Continue reading

Large Scale GAN Training for High Fidelity Natural Image Synthesis

08 Jan 2019 in Studies on Deep Learning, Generative Models

WHY?

Generating a high resolution image with GAN is difficult despite of recent advances. This paper suggests BigGAN which adds few tricks on previous model to generate large scale images without progressively growing the network.

Continue reading

Bilinear Attention Networks

07 Jan 2019 in Studies on Deep Learning, Computer Vision

WHY?

Representing bilinear relationship of two inputs is expensive. MLB efficiently reduced the number of parameters by substituting bilinear operation with Hadamard product operation. This paper extends this idea to capture bilinear attention between two multi-channel inputs.

Continue reading

Hybrid computing using a neural network with dynamic external memory

04 Jan 2019 in Studies on Deep Learning, Deep Learning

WHY?

Using external memory as modern computer enable neural net the use of extensible memory. This paper suggests Differentible Neural Computer(DNC) which is an advanced version of Neural Turing Machine.

Continue reading

Hybrid computing using a neural network with dynamic external memory

04 Jan 2019 in Studies on Deep Learning, Deep Learning

WHY?

Using external memory as modern computer enable neural net the use of extensible memory. This paper suggests Differentible Neural Computer(DNC) which is an advanced version of Neural Turing Machine.

Continue reading

SSD: Single Shot MultiBox Detector

03 Jan 2019 in Studies on Deep Learning, Computer Vision

WHY?

Object box proposal process is complicated and slow in object detection process. This paper proposes Single Shot Detector(SSD) to detect objects with single neural network.

Continue reading

Progressive Growing of GANs for improved Quality, Stability, and Variation

02 Jan 2019 in Studies on Deep Learning, Generative Models

WHY?

Training GAN on high-resolution images is known to be difficult.

Continue reading

BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding

02 Jan 2019 in Studies on Deep Learning, Natural Language Processing

WHY?

Former Transformer was unidirectional language model.

Continue reading

On the Dimensionality of Word Embedding

16 Dec 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Dimension of word embedding is usually determined with heuristic.

Continue reading

Inferring and Executing Programs for Visual Reasoning

15 Dec 2018 in Studies on Deep Learning, Computer Vision

WHY?

Neural modular networks do not generalize well to new questions since their performance rely on syntactic parser.

Continue reading

Deep Compositional Question Answering with Neural Module Networks

15 Dec 2018 in Studies on Deep Learning, Computer Vision

WHY?

Visual question answering task is compositional in nature.

Continue reading

Tracking Emerges by Colorizing Videos

13 Dec 2018 in Studies on Deep Learning, Computer Vision

WHY?

Segmenting objects in videos is difficult without manual labels.

Continue reading

Latent Alignment and Variational Attention

01 Dec 2018 in Studies on Deep Learning, Deep Learning

WHY?

Even though attention is being widely used, it is hard to be considered as probabilistic model as the attention does not marginalize.

Continue reading

Latent Alignment and Variational Attention

01 Dec 2018 in Studies on Deep Learning, Deep Learning

WHY?

Even though attention is being widely used, it is hard to be considered as probabilistic model as the attention does not marginalize.

Continue reading

FiLM: Visual Reasoning with a General Conditioning Layer

01 Dec 2018 in Studies on Deep Learning, Computer Vision

WHY?

There are some architectures for relational reasoning but lacks general-purpose components for relational reasoning and visual question answering.

Continue reading

TasNet: Time-Domain Audio Separation Network for Real-Time Single-channel Speech Separation

01 Dec 2018 in Studies on Deep Learning, Audio

WHY?

Separating multiple sources of audio is difficult task. Previous works mostly made mask for each source in time-fequency domain.

Continue reading

Recurrent Relational Networks

01 Dec 2018 in Studies on Deep Learning, Deep Learning

WHY?

Some tasks such as Sudoku require serial steps of relational inference.

Continue reading

Recurrent Relational Networks

01 Dec 2018 in Studies on Deep Learning, Deep Learning

WHY?

Some tasks such as Sudoku require serial steps of relational inference.

Continue reading

Modularity Matters: Learning Invariant Relational Reasoning Tasks

26 Nov 2018 in Studies on Deep Learning, Computer Vision

WHY?

Former CNN models fully activate(filly distributed features) for a single input showing poor performance on invariant relational reasoning.

Continue reading

Learning Visual Question Answering by Bootstrapping Hard Attention

23 Nov 2018 in Studies on Deep Learning, Computer Vision

WHY?

Hard attention is relatively less explored than soft attention.

Continue reading

Relational Deep Reinforcement Learning

22 Nov 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Relational information is important in some reinforcement learning tasks.

Continue reading

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

22 Nov 2018 in Studies on Deep Learning, Deep Learning

WHY?

Most VQA reasoning algorithms are not transparent in their reasoning and not robust to complex reasoning.

Continue reading

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

22 Nov 2018 in Studies on Deep Learning, Deep Learning

WHY?

Most VQA reasoning algorithms are not transparent in their reasoning and not robust to complex reasoning.

Continue reading

Learning to Reconstruct Shapes from Unseen Classes

22 Nov 2018 in Studies on Deep Learning, Deep Learning

WHY?

Constructing 3D shape from a single image is challenging. Training end-to-end to predict 3D shape from 2D image often end up overfitting while not generalizing well to other shapes.

Continue reading

Learning to Reconstruct Shapes from Unseen Classes

22 Nov 2018 in Studies on Deep Learning, Deep Learning

WHY?

Constructing 3D shape from a single image is challenging. Training end-to-end to predict 3D shape from 2D image often end up overfitting while not generalizing well to other shapes.

Continue reading

Isolating Sources of Disentanglement in VAEs

21 Nov 2018 in Studies on Deep Learning, Generative Models

WHY?

$\beta$ -VAE is known to disentangle latent variables.

Continue reading

Relationships from Entity Stream

21 Nov 2018 in Studies on Deep Learning, Computer Vision

WHY?

Relational Network showed great performance in relational reasoning, but calculations and memory consumption grow quadratically with the number of the objects due to fully connected pairing process.

Continue reading

Linguistic Regularities in Sparse and Explicit Word Representations

21 Nov 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Vector offset method is used for word analogy task.

Continue reading

Linguistic Regularities in Continuous Space Word Representations

21 Nov 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Vector space word representations capture syntactic and semantic regularities in language well.

Continue reading

Neural Word Embedding as Implicit Matrix Factorization

25 Oct 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Skip-Gram Negative Sampling(SGNS) showed amazing performance compared to traditional word embedding methods. However, it was not clear where SGNS converge to.

Continue reading

Dependency-Based Word Embeddings

24 Oct 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Traditional continuous word embeddings based on linear contexts. In other words, word embeddings considered only surrounding words as context.

Continue reading

Improving Distributional Similarity with Lessons Learned from Wrod Embeddings

01 Oct 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Word embedding using neural network(Skipgram) seems to outperform traditional count-based distributional model. However, this paper points out that current superiority of word2vec is not because of the algorithm itself, but because of system design choices and hyperparameter optimizations.

Continue reading

Hadamard Product for Low-rank Bilinear Pooling

19 Sep 2018 in Studies on Deep Learning, Deep Learning

WHY?

Bilinear model can caputure rich relation of two vectors. However, the computational complexity of bilinear model is huge due to its high dimensionality. To make bilinear model more applicable, this paper suggests low-rank bilinear pooling using Hadamard product.

Continue reading

Hadamard Product for Low-rank Bilinear Pooling

19 Sep 2018 in Studies on Deep Learning, Deep Learning

WHY?

Bilinear model can caputure rich relation of two vectors. However, the computational complexity of bilinear model is huge due to its high dimensionality. To make bilinear model more applicable, this paper suggests low-rank bilinear pooling using Hadamard product.

Continue reading

Stacked Attention Networks for Image Question Answering

19 Sep 2018 in Studies on Deep Learning, Deep Learning

WHY?

Visual question answering task is answering natural language questions based on images. To solve questions that require multi-step reasoning, stacked attention networks(SANs) stacks several layers of attention on parts of images based on query.

Continue reading

Stacked Attention Networks for Image Question Answering

19 Sep 2018 in Studies on Deep Learning, Deep Learning

WHY?

Visual question answering task is answering natural language questions based on images. To solve questions that require multi-step reasoning, stacked attention networks(SANs) stacks several layers of attention on parts of images based on query.

Continue reading

Multimodal Residual Learning for Visual QA

18 Sep 2018 in Studies on Deep Learning, Deep Learning

WHY?

Visual question answering task is to answer to natural language question based on images requiring extraction of information from both images and texts. Stacked Attention Networks(SAN) stacked several layers of attention to answer to complicated questions that requires reasoning. Multimodal Residual Network (MRN) points out weighted averaging of attention layers in SAN works as a bottleneck restricting the information of interaction between questions and images.

Continue reading

Multimodal Residual Learning for Visual QA

18 Sep 2018 in Studies on Deep Learning, Deep Learning

WHY?

Visual question answering task is to answer to natural language question based on images requiring extraction of information from both images and texts. Stacked Attention Networks(SAN) stacked several layers of attention to answer to complicated questions that requires reasoning. Multimodal Residual Network (MRN) points out weighted averaging of attention layers in SAN works as a bottleneck restricting the information of interaction between questions and images.

Continue reading

Relational Recurrent Neural Network

07 Sep 2018 in Studies on Deep Learning, Deep Learning

WHY?

While conventional deep learning models have performed well on inference regarding individual entities, few models have been proposed focusing on inference regarding the relations among entities. Relational network aimed to capture relational information from images. In this paper, relational recurrent neural network tried to improve former memory augmented neural network models to capture relations among memories.

Continue reading

Relational Recurrent Neural Network

07 Sep 2018 in Studies on Deep Learning, Deep Learning

WHY?

While conventional deep learning models have performed well on inference regarding individual entities, few models have been proposed focusing on inference regarding the relations among entities. Relational network aimed to capture relational information from images. In this paper, relational recurrent neural network tried to improve former memory augmented neural network models to capture relations among memories.

Continue reading

Phase-aware Speech Enhancement with Deep Complex U-net

06 Sep 2018 in Studies on Deep Learning, Audio

WHY?

For audio source separation task, traditional approach only utilized magnitude part ignoring phase part. Previously deep complex network provided complex arithmetics via convolution.

Continue reading

Transformation Autoregressive Networks

05 Sep 2018 in Studies on Deep Learning, Generative Models

WHY?

Autoregressive model has been dominant model for density estimation. On the other hand, various non-linear transformations techniques enabled tracking of density after transformation of variables. Transformation Autoregressive Networks(TAN) combined non-linear transformation into autoregressive model to capture more complicated density of data.

Continue reading

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

04 Sep 2018 in Studies on Deep Learning, Generative Models

WHY?

Two approximation methods, Variational inference and MCMC, have different advantages: usually, variational inference is fast while MCMC is more accurate.

Continue reading

Neural Process

30 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

Gaussian process has several advantages. Based on robust statistical assumptions, GP does not require expensive training phase and can represent uncertainty of unobserved areas. However, HP is computationally expensive. Neural process tried to combine the best of Gaussian process and neural network.

Continue reading

Neural Process

30 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

Gaussian process has several advantages. Based on robust statistical assumptions, GP does not require expensive training phase and can represent uncertainty of unobserved areas. However, HP is computationally expensive. Neural process tried to combine the best of Gaussian process and neural network.

Continue reading

Grammar Variational Autoencoder

27 Aug 2018 in Studies on Deep Learning, Generative Models

WHY?

Generative models of discrete data with particular structure (grammar) often result invalid outputs. Grammar Variational Autoencoder(GVAE) forces the decoder of VAE to result only valid outputs.

Continue reading

IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

13 Aug 2018 in Studies on Deep Learning, Generative Models

WHY?

VAE can learn useful representation while GAN can sample sharp images.

Continue reading

Deep AutoRegressive Networks

10 Aug 2018 in Studies on Deep Learning, Generative Models

WHY?

Learning directed generative model is difficult.

Continue reading

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Network

10 Aug 2018 in Studies on Deep Learning, Generative Models

WHY?

GAN had troble modeling the entire image.

Continue reading

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

09 Aug 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

This paper first prove that the expresiveness of a language model is restricted by softmax and suggest a way to overcome this limit.

Continue reading

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

08 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

Gradient descent methods depend on the first order gradient of a loss function wrt parameters. However, the second order gradient(Hessian) is often neglected.

Continue reading

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

08 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

Gradient descent methods depend on the first order gradient of a loss function wrt parameters. However, the second order gradient(Hessian) is often neglected.

Continue reading

Variational Inference for Monte Carlo Objectives

07 Aug 2018 in Studies on Deep Learning, Generative Models

WHY?

Recent variational training requires sampling of the variational posterior to estimate gradient. NVIL estimator suggest a method to estimate the gradient of the loss function wrt parameters. Since score function estimator is known to have high variance, baseline is used as variance reduction technique. However, this technique is insufficient to reduce variance in multi-sample setting as in IWAE.

Continue reading

Noisy Network for Exploration

06 Aug 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Efficient exploration of agent in reinforcement learning is an important issue. Conventional exploration heuristics includes $\epsilon$ -greedy for DQN and entropy reward for A3C.

Continue reading

Unsupervised Deep Embedding for Clustering Analysis

02 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

There had been little study on learning representation that focus on clustering.

Continue reading

Unsupervised Deep Embedding for Clustering Analysis

02 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

There had been little study on learning representation that focus on clustering.

Continue reading

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

02 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

Batch normalization is known as a good method to stablize the optimization of neural network by reducing internal covariate shift. However, batch normalization inheritantly depends on minibatch which impeding the use in recurrent models.

Continue reading

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

02 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

Batch normalization is known as a good method to stablize the optimization of neural network by reducing internal covariate shift. However, batch normalization inheritantly depends on minibatch which impeding the use in recurrent models.

Continue reading

Spectral Normalization for Generative Adversarial Networks

31 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

The largest drawback of training Generative Adversarial Network (GAN) is its instability. Especially, the power of discriminator greatly affect the performance of GAN. This paper suggests to weaken the discriminator by restricting the functional space of it to stablize the training.

Continue reading

A Two-Step Disentanglement Method

27 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

This paper wanted to disentangle the label related and label unrelated information from data. The model of this paper is simpler and more effective than that of Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Continue reading

Deep Contextualized Word Representations

26 Jul 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Former word representations such as Word2Vec or GloVe didn’t contained linguistic context.

Continue reading

Neural Variational Inference and Learning in Belief Networks

26 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

Directed latent variable models are known to be difficult to train at large scale because posterior distribution is intractable.

Continue reading

Neural Autoregressive Distribution Estimation

25 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

Estimating the distribution of data can help solving various predictive taskes. Approximately three approaches are available: Directed graphical models, undirected graphical models, and density estimation using autoregressive models and feed-forward neural network (NADE).

Continue reading

Glow: Generative Flow with Invertible 1x1 Convolutions

23 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

The motivation is almost the same as that of NICE, and RealNVP.

Continue reading

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

18 Jul 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

All the previous neural machine translators are based on word-level translation. Word-level translators has critical problem of out-of-vocabulary error.

Continue reading

Density Estimation Using Real NVP

18 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

The motivation is almost the same as that of NICE. This papaer suggest more elaborate transformation to represent complex data.

Continue reading

NICE: Non-linear Independent Components Estimation

17 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

Modeling data with known probability distribution has a lot of advantages. We can exactly calculate the log likelihood of the data and easily sample new data from distribution. However, finding tractable transformation of data into probability distribution or vice versa is difficult. For instance, a neural encoder is a common way to transform data but its log-likelihood is known to be intractable and another separately trained decoder is required to sample data.

Continue reading

Categorical Reparameterization with Gunbel-Softmax

17 Jul 2018 in Studies on Deep Learning, Deep Learning

WHY?

The same motivation with Concrete.

Continue reading

Categorical Reparameterization with Gunbel-Softmax

17 Jul 2018 in Studies on Deep Learning, Deep Learning

WHY?

The same motivation with Concrete.

Continue reading

Deterministic Policy Gradient Algorithms

16 Jul 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Policy gradient usually requires integral over all the possible actions.

Continue reading

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

16 Jul 2018 in Studies on Deep Learning, Deep Learning

WHY?

Reparameterization trick is a useful technique for estimating gradient for loss function with stochastic variables. While score function extimators suffer from great variance, RT enable the gradient to be estimated with pathwise derivatives. Even though reparameterization trick can be applied to various kinds of random variables enabling backpropagation, it has not been applicable to discrete random variables.

Continue reading

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

16 Jul 2018 in Studies on Deep Learning, Deep Learning

WHY?

Reparameterization trick is a useful technique for estimating gradient for loss function with stochastic variables. While score function extimators suffer from great variance, RT enable the gradient to be estimated with pathwise derivatives. Even though reparameterization trick can be applied to various kinds of random variables enabling backpropagation, it has not been applicable to discrete random variables.

Continue reading

How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift

11 Jul 2018 in Studies on Deep Learning, Deep Learning

WHY?

While the effect of batch normalization was widely proven empirically, the exact mechanism of it is yet been understood. Commonly known explanation for this was internal covariance shift(ICS) meaning the change in the distribution of layer inputs caused by updates to the preceeding layers.

Continue reading

How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift

11 Jul 2018 in Studies on Deep Learning, Deep Learning

WHY?

While the effect of batch normalization was widely proven empirically, the exact mechanism of it is yet been understood. Commonly known explanation for this was internal covariance shift(ICS) meaning the change in the distribution of layer inputs caused by updates to the preceeding layers.

Continue reading

World Models

10 Jul 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Instead of instantly responding to incoming stimulus, having a model of environment to make some level of prediction would help perform in reinforcement learning.

Continue reading

Distributed Prioritized Experience Replay

07 Jul 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Gorila framework separated several actors and learners with a centralized parameter server to parrallelize the learning process. This framework required one GPU per learner.

Continue reading

A Hierarchical Latent Variable Encoder-Decoder model for Generating Dialogues

06 Jul 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Hierarchical recurrent encoder-decoder model(HRED) that aims to capture hierarchical structure of sequential data tends to fail because model is encouraged to capture only local structure and LSTM often has vanishing gradient effect.

Continue reading

Forward-Backward Reinforcement Learning

06 Jul 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Reinforcement learning with sparse reward often suffer from finding rewards.

Continue reading

A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

05 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

This paper wanted to catch non-linear dynamics of the object in video.

Continue reading

RUDDER: Return Decomposition for Delayed Rewards

03 Jul 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

In many enviroments of RL, rewards tend to be delayed from the actions taken. This paper proved that delayed reward exponentially increase the time of conversion in TD, and exponentially increase the variance in MC estimates.

Continue reading

Fixing a Broken ELBO

29 Jun 2018 in Studies on Deep Learning, Generative Models

WHY?

Most of deep directed latent variable models including VAE try to maximize the marginal likelihood by maximizing the Evidence Lower Bound(ELBO). However, marginal likelihood is not sufficient to represent the performance of the model.

Continue reading

Understanding deep learning requires rethinking generalization

28 Jun 2018 in Studies on Deep Learning, Deep Learning

WHY?

Traditional explanation for generalization of a machine learning model was primarily concerned with tradeoff between model capacity and overfitting. If capacity of a model is too large, you must contrain the capacity to prevent overfitting. Choosing appropriate level of capacity of model has been seen as a key to generalized performance.

Continue reading

Understanding deep learning requires rethinking generalization

28 Jun 2018 in Studies on Deep Learning, Deep Learning

WHY?

Traditional explanation for generalization of a machine learning model was primarily concerned with tradeoff between model capacity and overfitting. If capacity of a model is too large, you must contrain the capacity to prevent overfitting. Choosing appropriate level of capacity of model has been seen as a key to generalized performance.

Continue reading

Learning to See by Moving

28 Jun 2018 in Studies on Deep Learning, Computer Vision

WHY?

Feature learning in Convolution Neural Network requires many hand labeled data. It would be useful if one can use other form of supervision. In nature world, organisms acquire many essential information regarding vision by moving itself(egomotion).

Continue reading

Adversarial Variational Bayes: Unifying Variational Autoencoder and Generative Adversarial Networks

18 Jun 2018 in Studies on Deep Learning, Generative Models

WHY?

In VAE framework, the quality of the generation relies on the expressiveness of inference model. Restricting hidden variables to Gaussian distribution with KL divergence limits the expressiveness of the model.

Continue reading

Massively Parallel Methods for Deep Reinforcement Learning

13 Jun 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

A single agent usually takes too long to train.

Continue reading

Spatial Transformer Networks

13 Jun 2018 in Studies on Deep Learning, Computer Vision

WHY?

Former CNNs were not spatially invariant.

Continue reading

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

06 Jun 2018 in Studies on Deep Learning, Generative Models

WHY?

VAE구조를 활용하여 semi-superviesd training을 할 때 label정보를 활용하여 label에 관한 정보와 그 외의 정보를 disentangle하고자 한다.

Continue reading

Neural Turing Machine

06 Jun 2018 in Studies on Deep Learning, Deep Learning

WHY?

기존의 Turing Machine, 혹은 Von Neumann architecture 구조를 가진 컴퓨터들은 elementary operations, logical flow control, external memory라는 근본 메커니즘을 가진다. 기존의 machine learning 기법들은 이 세가지 메커니즘 중 첫번째에 집중해왔다. 최근 나온 RNN구조는 Turing complete하여 이를 활용하여 Neural Truing Machine을 만들 수 있다. 기존 Turing machine과는 다르게 NTM은 gradient descent로 학습이 가능하다.

Continue reading

Neural Turing Machine

06 Jun 2018 in Studies on Deep Learning, Deep Learning

WHY?

기존의 Turing Machine, 혹은 Von Neumann architecture 구조를 가진 컴퓨터들은 elementary operations, logical flow control, external memory라는 근본 메커니즘을 가진다. 기존의 machine learning 기법들은 이 세가지 메커니즘 중 첫번째에 집중해왔다. 최근 나온 RNN구조는 Turing complete하여 이를 활용하여 Neural Truing Machine을 만들 수 있다. 기존 Turing machine과는 다르게 NTM은 gradient descent로 학습이 가능하다.

Continue reading

Hierarchical Variational Autoencoders for Music

05 Jun 2018 in Studies on Deep Learning, Generative Models

WHY?

VAE의 구조의 latent variable을 사용하여 sequence를 reconstruct하려는 시도는 많았다. 하지만 decoder가 너무 강력해서 latent variable을 무시하는 ‘Posterior Collapse’현상이 많이 일어나서 긴 sequence의 정보를 가진 latent variable을 구성하긴 힘들었다.

Continue reading

Highway Network

05 Jun 2018 in Studies on Deep Learning, Deep Learning

WHY?

일반적으로 뉴럴 네트워크의 층이 깊어지면 성능이 향상되지만 그만큼 학습하기는 더욱 어려워진다.

Continue reading

Highway Network

05 Jun 2018 in Studies on Deep Learning, Deep Learning

WHY?

일반적으로 뉴럴 네트워크의 층이 깊어지면 성능이 향상되지만 그만큼 학습하기는 더욱 어려워진다.

Continue reading

Importance Weighted Autoencoders

04 Jun 2018 in Studies on Deep Learning, Generative Models

Note

Importance Sampling

Continue reading

Deep Recurrent Q-Learning for Partially Observable MDPs

04 Jun 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

기존의 DQN은 MDP설정을 따르기 때문에 현재의 상태가 과거의 모든 정보를 포함하고 있다고 전제한다.

Continue reading

An Efficient Framework for Learning Sentence Representations

04 Jun 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Sentence를 vector representation으로 나타내는 것은 쉽지 않고 특히 Unlabeled data로 부터 만드는 것은 더욱 어렵다.

Continue reading

A Deep Generative Model for Disentangled Representations of Sequential Data

30 May 2018 in Studies on Deep Learning, Generative Models

WHY?

Video나 Audio와 같은 sequential data로 부터 time-dependent feature(dynamics)와 time independent feature(content)를 구분하는데 성공하였다.

Continue reading

DRAW: A Recurrent Neural Nerwork For Image Generation

24 May 2018 in Studies on Deep Learning, Generative Models

WHY?

사람의 눈은 그림을 볼 때 한번에 보는 것이 아니라 시선에 따라 부분적으로 본다.

Continue reading

Disentangling Controllable and Uncontrollable Factors of Variation by Interacting with the World

23 May 2018 in Studies on Deep Learning, Generative Models

WHY?

Independently Controllable Factors에서는 agent가 environment와의 상호작용을 통하여 control가능한 factor들을 학습할 수 있었다. 하지만 uncontrollable한 factor는 인식하지 못한다는 단점이 있었다.

Continue reading

Learning Hierarchical Features from Generative models

23 May 2018 in Studies on Deep Learning, Generative Models

WHY?

NN모델들은 이미지를 인식/분류할 때 계층적 특징들을 학습한다. 하지만 Generative model들은 계층적으로 생성하지 않는다. Stacked Hierarchy를 가지고 있는 HVAE(Hierarchical VAE)같은 경우는 계층적인 구조를 가지고 있지만 각 층이 계층적인 특징을 학습하지 못한다. 마지막 층(Bottom layer)에 정보가 충분하여 마지막 층만 사용하여 이미지를 reconstruct할 수 있다. 하지만 마지막 층만 사용한다면 unimodal하기 때문에 multimodal한 구조를 잡지 못하고 특징들은 disentangle되지 못한다.

Continue reading

Curriculum Learning

22 May 2018 in Studies on Deep Learning, Deep Learning

WHY?

사람은 특정 일을 학습할 때 쉬운 일 부터 배운 후에 점차 어려운 일을 배운다.

Continue reading

Curriculum Learning

22 May 2018 in Studies on Deep Learning, Deep Learning

WHY?

사람은 특정 일을 학습할 때 쉬운 일 부터 배운 후에 점차 어려운 일을 배운다.

Continue reading

Ladder Variational Autoencoders

21 May 2018 in Studies on Deep Learning, Generative Models

WHY?

VAE는 generation에 상당히 강력한 도구이지만 conditional stochastic variable을 통하여 생성하기 때문에 계층적인 특징을 학습하는 깊은 모델을 만들기 어렵다.

Continue reading

Independently Controlable Factors

21 May 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 이미지에 대한 Disentangling은 static한 이미지에 대하여 이루어져 왔다. 하지만 강화 학습의 프레임웍에서 에이전트가 환경과 상호작용을 한다면 자신의 행동에 대한 독립적인 변화 요인들을 추출할 수 있을 것이다.

Continue reading

The Consciousness Prior

17 May 2018 in Studies on Deep Learning, Deep Learning

WHY?

consciousness를 특정 순간의 awareness라고 한다면 여러 저차원 concept들의 조합이라고 생각할 수 있다. 이러한 저차원 개념(thought vector)들은 현실에 대한 사실이거나 결정을 내리는데 유용한 명제(statement)로 구성될 수 있다. 이러한 의식을 통하여 미래에 대한 예측을 하고 의사결정을 내릴 수 있다. 기존에는 감각으로 부터 오는 데이터(sensory data)로 부터 직접 의사결정을 한다고 간주되어 왔지만 consciousness prior는 agent가 고층위 abstract space에서 예측 및 의사결정을 내릴 수 있게 해준다. 이러한 conciousness prior는 복잡한 의식으로 부터 단순화된 언어를 통하여 표현하는 자연어 발화 과정에도 부합한다. consciousnes는 agent가 현재에 유용한 표현을 만들기 위하여 abstract concept들 중에서 몇몇의 concept에 주목하여 구성된다는 점에서 착안하여 attentive awareness 라고도 볼 수 있다.

Continue reading

The Consciousness Prior

17 May 2018 in Studies on Deep Learning, Deep Learning

WHY?

consciousness를 특정 순간의 awareness라고 한다면 여러 저차원 concept들의 조합이라고 생각할 수 있다. 이러한 저차원 개념(thought vector)들은 현실에 대한 사실이거나 결정을 내리는데 유용한 명제(statement)로 구성될 수 있다. 이러한 의식을 통하여 미래에 대한 예측을 하고 의사결정을 내릴 수 있다. 기존에는 감각으로 부터 오는 데이터(sensory data)로 부터 직접 의사결정을 한다고 간주되어 왔지만 consciousness prior는 agent가 고층위 abstract space에서 예측 및 의사결정을 내릴 수 있게 해준다. 이러한 conciousness prior는 복잡한 의식으로 부터 단순화된 언어를 통하여 표현하는 자연어 발화 과정에도 부합한다. consciousnes는 agent가 현재에 유용한 표현을 만들기 위하여 abstract concept들 중에서 몇몇의 concept에 주목하여 구성된다는 점에서 착안하여 attentive awareness 라고도 볼 수 있다.

Continue reading

Matching Networks for One Shot Learning

16 May 2018 in Studies on Deep Learning, Deep Learning

WHY?

현실에서 방대한 데이터와 그에 맞는 label을 구할 수 없기 때문에 몇 가지의 데이터에 대한 label로 좋은 성과를 내는 semi-supervised learning은 중요하다.

Continue reading

Matching Networks for One Shot Learning

16 May 2018 in Studies on Deep Learning, Deep Learning

WHY?

현실에서 방대한 데이터와 그에 맞는 label을 구할 수 없기 때문에 몇 가지의 데이터에 대한 label로 좋은 성과를 내는 semi-supervised learning은 중요하다.

Continue reading

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

16 May 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

RNN계열의 sequence model들은 언어모델에 효과적이지만 추론이 느리고 gradient가 사라지거나 long-term dependency를 잡지 못하는 등의 문제점이 있다.

Continue reading

Generating Sentences from a Continuous Space

09 May 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존의 언어모델(RNNLM)은 문장을 만들 때 한 단어씩 만들어야 하며 문장의 정보를 가지고 있는 latent representation의 정보를 활용하지 못한다.

Continue reading

SCAN: Learning Hierarchical Compositional Visual Concepts

08 May 2018 in Studies on Deep Learning, Generative Models

WHY?

실제 세상은 물리나 화학과 같은 몇 가지 일관적인 규칙으로 부터 무한히 다양한 형태가 나온다. 이러한 규칙들은 추상화를 통하여 개념으로 나타날 수 있으며 이러한 개념들의 계층과 조합을 통하여 지수적으로 많은 개념을 만들어낸다.

Continue reading

A simple neural network module for relational reasoning

06 May 2018 in Studies on Deep Learning, Deep Learning

WHY?

관계적 추론(relation reasoning)은 중요한 지능 중 하나지만 neural net은 이를 학습하지 못하였다. 그리하여 관계를 묻는 문제에 답하는 일에 대해서는 좋은 성과를 거두지 못하고 있었다.

Continue reading

A simple neural network module for relational reasoning

06 May 2018 in Studies on Deep Learning, Deep Learning

WHY?

관계적 추론(relation reasoning)은 중요한 지능 중 하나지만 neural net은 이를 학습하지 못하였다. 그리하여 관계를 묻는 문제에 답하는 일에 대해서는 좋은 성과를 거두지 못하고 있었다.

Continue reading

Understanding Disentangling in Beta-VAE

03 May 2018 in Studies on Deep Learning, Generative Models

WHY?

$\beta$ -VAE는 최근 disentangling에서 좋은 성과를 거두고 있지만 이에 대하여 깊이있는 이해가 되고 있지는 않다.

Continue reading

Deep Variational Information Bottleneck

03 May 2018 in Studies on Deep Learning, Deep Learning

Note

Information bottleneck이론이란 데이터 X로부터 관련 정보인 Y로 정보를 압축할 때 Y와의 관련성(accuracy)과 X의 압축성(compression)사이의 최고의 tradeoff를 정보량을 통하여 찾는 기법을 말한다. $R_{IB}(\theta) = I(Z, Y; \theta) - \beta I(Z, X; \theta)$

Continue reading

Deep Variational Information Bottleneck

03 May 2018 in Studies on Deep Learning, Deep Learning

Note

Information bottleneck이론이란 데이터 X로부터 관련 정보인 Y로 정보를 압축할 때 Y와의 관련성(accuracy)과 X의 압축성(compression)사이의 최고의 tradeoff를 정보량을 통하여 찾는 기법을 말한다. $R_{IB}(\theta) = I(Z, Y; \theta) - \beta I(Z, X; \theta)$

Continue reading

Deep Convolutional Inverse Graphics Network

01 May 2018 in Studies on Deep Learning, Generative Models

WHY?

Image의 representation을 interpretable하게 만든다면 원하는 이미지를 생성해낼 수 있을 것이다.

Continue reading

Semi-supervised Learning with Deep Generative Models

01 May 2018 in Studies on Deep Learning, Generative Models

WHY?

Semi-supervised learning이란 적은 양의 labeled 데이터로 부터 정보를 효과적으로 generalize하여 많은 양의 unlabeled 데이터의 정보를 활용하고 이를 통하여 모델의 성능을 향상시키는 학습을 말한다.

Continue reading

Variational Inference of Disentangled Latent Concepts from Unlabeled Observations

30 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

Disentanling과정은 기본적으로 x내에서 독립적인 요소를 찾아 각각 다른 z로 나누는 작업이다. 이를 위하여 z의 prior를 independent Gaussian(N(1,0))로 간주하여 근사하거나(Beta-VAE) Batch내의 z의 값을 permutation하여 adversarial training하는 방법으로 독립을 유도하였다(FVAE). 하지만 Beta-VAE는 모든 관측치의 분포를 N(1,0)으로 강제하여 관측치의 차이에 덜 민감하게 만들어 reconstruction의 성능이 떨어진다.

Continue reading

Disentangling by Factorising

27 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

Beta-VAE에서는 q(z|x)와 p(z)의 KL divergence에 추가적인 penalty를 줌으로서 z간의 독립성을 유도하여 disentangling을 이루었다. 하지만 기존의 vae보다 reconstruction 성능이 떨어지는 결과가 나타났다.

Continue reading

Adversarially Regularized Autoencoders

26 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 autoencoder는 이미지나 음성과 같은 연속형 데이터에 대한 latent structure를 잘 잡았지만 텍스트와 같은 discrete한 데이터의 latent structure를 잡는 것은 어려웠다.

Continue reading

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

19 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

이미지에 대한 unsupervised disentangling의 시도는 많았지만 오디오와 같은 sequential data에서의 unsupervised disentangling의 시도는 많지 않았다.

Continue reading

Multimodal Unsupervised Image-to-Image Translation

18 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

unsupervised하게 이미지를 source domain에서 target domain으로 보내는 것을 image to image translation이라고 한다. 기존의 방법들은 이렇게 다른 도메인으로 mapping하는 방법이 deterministic하다고 간주하여 왔기 때문에 다양한 이미지를 생성할 수 없었다.

Continue reading

PixelGAN Autoencoders

18 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 implicit generative model(VAE)들은 hierarchical latent codes를 통하여 데이터의 statistics를 학습할 수 있지만 decoder를 통한 sampling만 가능하고 likelihood function은 tractable하지 않다. 반대로 likelihood function을 학습할 수 있는 autoregressive neural networks(NADE, MADE, PixelCNN)들은 likelihood function을 학습할 수 있는 대신 latent codes를 활용하지 못한다.

Continue reading

Conditional Image Synthesis with Auxiliary Classifier GANs

13 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 conditional gan에서는 hidden representation에 class의 정보를 concate하고 그에 맞는 이미지가 형성되길 바라는 방법 밖에 없었다. 이 방법은 단순한 이미지를 형성할 때는 괜찮지만 복잡한 이미지를 만들기는 힘들었다.

Continue reading

Disentangling Factors of Variation in Deep Representations Using Adversarial Training

12 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

한 이미지가 가진 정보에서 label이 가지는 정보와 그 외의 정보로 나뉠 수 있다. 이 두 정보를 분리하여 추출할 때 이에 대한 supervision이 있다면 추출하기가 쉽겠지만 현실 데이터에서는 한계가 있다.

Continue reading

Variational Lossy Autoencoders

11 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 neural autoregressive model(RNN, MADE, PixelRNN/CNN)들은 VAE의 decoder로서 부적합하다고 여겨져왔다. 왜냐하면 이 모델들의 표현력이 너무 강력해서 종종 latent variable들을 무시하고 이미지를 생성했기 때문이다.

Continue reading

Adversarial Autoencoders

11 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 VAE는 latent variable들의 aggregated posterior를 특정 분포를 따르게 하기 위하여 KL Divergence의 regularizer를 사용한다.

Continue reading

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

11 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 WaveNet은 음성 데이터의 학습은 빠른 반면 Sampling(Generation)은 순차적으로 이루어져야 하기 때문에 매우 속도가 느렸다.

Continue reading

Unpaired Image-to-image Translation using Cycle-Consistent Adversarial Network

05 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

이미지들 간의 style을 뽑아내기 위해서는 같은 모습에 대하여 다른 스타일이 적용되어 있는 이미지들이 pairing되어야 한다.

Continue reading

Improved Variational Inference with Inverse Autoregressive Flow

04 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

Normializing flow는 posterior를 유연하게 가정하기 위하여 x를 변환시킨다. 이때 변환은 가역적이어야 하고 determinant를 구하기 쉬워야 한다.

Continue reading

Masked Autoregressive Flow for Density Estimation

03 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

MADE나 PixelCNN/RNN과 같은 autoregressive한 neural density estimator들은 좋은 성과를 보여왔다.
Normalizing Flow를 사용하면 Planar/radial flow나 inverse Autoregressive Flow와 같은 특정한 변환에 한해서는 빠르게 Density evaluation을 할 수 있어 variational inference에 유용하게 사용되었다. 하지만 새로운 데이터들에 대해서는 효율적으로 계산하기가 어렵기 때문에 density estimation에는 적합하지 않았다.

Continue reading

MADE: Masked Autoencoder for Distribution Estimation

02 Apr 2018 in Studies on Deep Learning, Generative Models

WHY?

단순 Autoencoder에서는 x내에 특정 구조를 가정하지 않는다.

Continue reading

Variational Inference with Normalizing Flows

29 Mar 2018 in Studies on Deep Learning, Generative Models

WHY?

Variational Inference를 할때 다루기 쉬운 형태의 posterior함수 q를 가정하고 이를 실제 분포에 근사한다. 하지만 posterior를 쉬운 형태로 가정하기 때문에 실제 분포에 잘 근사되지 않는 것과 같은 한계가 존재한다.

Continue reading

Continuous Control with Deep Reinforcement Learning

28 Mar 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

DQN은 각 상황과 행동에 대한 가치를 평가하는 함수인 Q function을 근사하여 이에 따라 때문에 action space가 discrete할 수 밖에 없다.

Continue reading

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

27 Mar 2018 in Studies on Deep Learning, Generative Models

WHY?

PixelRNN에서 제안되었던 PixelCNN은 함께 제안되었던 다른 구조와는 달리 유연하고 병렬화가 가능하여 계산적으로 효율적이며 성능 또한 뛰어났다.

Continue reading

Model-Free Episodic Control

26 Mar 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

인간은 어떤 환경에서 한번 보상을 보게 되면 그에 대해 빠르게 학습한다. DQN은 환경을 학습하기 위하여 그 상태와 행동의 가치를 근사하지만 환경 전체를 알기 위해서는 아주 오래걸린다.

Continue reading

Neural Discrete Representation Learning

25 Mar 2018 in Studies on Deep Learning, Generative Models

WHY?

VAE에서 추출하는 latent variable들은 기본적으로 continuous하다. 하지만 언어와 같이 discrete환경에서는 discrete한 latent variable들이 필요하기도 하다.

Continue reading

Wavenet: A Generative Model For Raw Audio

22 Mar 2018 in Studies on Deep Learning, Generative Models

WHY?

PixelRNN에서는 LSTM을 활용하여 distribution으로부터 auto-regressive한 구조를 추출할 수 있었다.

Continue reading

Pixel Recurrent Neural Networks

21 Mar 2018 in Studies on Deep Learning, Generative Models

WHY?

이미지의 generation을 위해서 density를 추정해야하는데 이는 매우 어려운 일이다.

Continue reading

Attention is all you need

21 Mar 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존의 sequence transduction모델들은 복잡한 RNN이나 CNN을 기반으로 하였고 attention mechanism을 활용하였다.

Continue reading

Prioritized Experience Replay

20 Mar 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

기존의 DQN은 학습 데이터들 간의 연관관계를 없애기 위하여 데이터들을 Experience Replay에 저장해 두고 랜덤으로 샘플하여 학습하였다. 하지만 모든 경험이 같은 가치를 가지는 것은 아니다. reward가 sparse한 환경의 경우 특정 경험이 더욱 중요한 가치를 가질 수 있다.

Continue reading

Dueling Network Architectures for Deep Reinforcement Learning

19 Mar 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

기존의 DQN은 특정 지점에서의 action-value function을 근사하기 위하여 모든 state와 action의 값을 모두 평가해야 한다는 단점이 있다. 하지만 대부분의 경우, state의 가치가 중요하고 action으로 인한 가치의 변화가 극명한 경우는 많지 않다. 또한 어차피 행동을 고르기 위해서 action-value function을 근사하기 때문에 모든 state와 action에 대하여 정확한 값을 아는 것이 중요한 것이 아니라 다른 action과 비교한 상대값이 중요하다.

Continue reading

Deep Reinforcement Learning with Double Q-learning

19 Mar 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

기존의 DQN은 특정 지점에서의 action-value function을 그 state에서 action을 취했을 때 즉각적으로 얻는 reward와 그 다음 상태의 가치를 discount한 값을 더한 것으로 근사한다. 조금 더 근사를 효율적으로 하기 위하여 target 네트워크를 사용하는데 이때 다음 state의 가치를 최선의 action을 한 결과로 판단하기 때문에 낙관하는(overoptimistic) 결과가 나타난다. 이러한 낙관적인 예측은 점진적으로 suboptimal한 policy에 수렴하도록 유도할 수 있다.

Continue reading

Wasserstein Auto-Encoders

16 Mar 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 VAE에서는 variational lower bound를 통하여 marginal log-likelihood $E_{P_{X}}[log_{P_{G}}(X)$ 를 최대화 하고 Q(z)와 P(z|x)간의 KL Divergence를 최소화하도록 하는 regularization term을 통하여 encoder와 decoder를 학습시켰다.

Continue reading

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

14 Mar 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 GAN을 통하여 이미지를 생성할 때 이미지의 특성에 대한 조건을 부여할 수 없었다.

Continue reading

Wasserstein GAN

12 Mar 2018 in Studies on Deep Learning, Generative Models

WHY?

기존의 대표적인 Generation Model로 VAE와 GAN이 있다. VAE는 목적 분포를 직접 구하는 대신 계산 가능한 다른 분포를 가정하고 이와 목적 분포와의 거리(KL Divergence)를 최소화 함으로써 간접적으로 목적 분포를 구하였다. 하지만 KL Divergence는 두 함수의 support가 같은 영역에서 정의가 되어있어야 한다는 한계가 있다. 또한 GAN은 목적 분포를 직접 구하지 않고도 Discriminator loss를 통하여 표본을 생성하지만 discriminator와 generator간의 학습 비율이 중요하고 섬세하여 학습이 어렵다는 단점이 있다.

Continue reading

Bi-Directional Attention Flow for Machine Comprehension

27 Feb 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존 machine comprehension 모델들의 attention은 문맥의 조그마한 부분에 주목하여 문맥을 특정 길이의 벡터로 요약을 하고 어탠션을 단방향적으로, temporal하게 적용하였다. 이러한 기존의 attention 방법은 요약하는 과정에서 정보를 손실하기도 하고 순차적으로 이루어지는 attention간에 의존성이 나타나기 때문에 attention의 역할과 model의 역할이 섞이게 된다.

Continue reading

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

23 Feb 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존의 MC(Machine Comprehension)의 데이터셋들(CNN/DailyMail/SQuAD)은 제시문과 그에 직접적인 질문들이 주어진다. 하지만 인간이 정보 검색을 할때의 환경은 훨씬 노이즈가 심하다.

Continue reading

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

20 Feb 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존의 NMT(Neural Machine Translation) 모델들은 특정 source languange에서 특정 target language로만 번역할 수 있었고 이를 위하여 이에 맞는 corpus가 필요하였다.

Continue reading

Dynamic Routing Between Capsules

20 Feb 2018 in Studies on Deep Learning, Computer Vision

WHY?

기존 CNN의 문제점은 Max-pooling layer에서 feature의 대략적인 존재여부만 확인하고 정확한 공간정보를 버린다는 것이다. 이 때문에 특징이 어디에 존재하건 존재여부를 확인할 수 있는 invariance한 성질을 가지지만 그 특징이 다른 특징들과 전혀 조화를 이루지 못하더라도 이를 구별하지 못한다. 우리가 원하는 것은 특징의 단순한 존재여부 뿐만 아니라 전체적인 조화까지 고려하는 equivariance의 성질이다.

Continue reading

Memory Networks

18 Feb 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존의 RNN모델로는 많은 양의 정보를 함축하지 못하기 때문에 복잡한 추론 문제 등을 푸는데 한계가 있었다.

Continue reading

Mask R-CNN

17 Feb 2018 in Studies on Deep Learning, Computer Vision

WHY?

기존의 화면에서 사물을 박스치는(Detection) Faster-RCNN에서 한 단계 더 나아가서 특정사물의 영역을 표시하는(Segmentation) 모델을 제안하였다.

Continue reading

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

15 Feb 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존의 언어 모델들은 특정 문제에만 좋은 성과를 보일 수 있는 구조를 가지고 있지만 일반적인 모든 문제에 적용하기는 힘들다. Dynamic Memory Network는 어떤 자연어 처리 문제도 해결할 수 있는 구조를 가지고 있다.

Continue reading

Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

11 Feb 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

QA의 정확도를 올리기 위하여 QA모델의 성능만 올릴 필요는 없다. 좀더 정확한 답을 얻기 위하여 각 QA모델에게 적합하도록 질문을 재구성하는 방법을 고안하였다.

Continue reading

Auto-Encoder Variational Bayes

11 Feb 2018 in Studies on Deep Learning, Generative Models

Note

오토인코더는 분포 $p(x)$ 를 축소하여 latent variable z로 요약한 뒤 이를 재구성한 것과 원래의 데이터의 차이를 최소한으로 하도록 인코더와 디코더를 학습한다. 이 결과 z를 통하여 x의 가장 중요한 특징들을 요약하길 바란다.

Continue reading

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

09 Feb 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

기존 DQN의 큰 문제 중 하나는 강화학습 도중 피드백이 sparse하거나 delayed된 경우 충분한 탐색을 하지 못한다는 것이다.

Continue reading

Early Visual Concept Learning with Unsupervised Deep Learning

06 Feb 2018 in Studies on Deep Learning, Generative Models

WHY?

이미지의 피쳐를 추출할 때, 한 피쳐 값이 이미지에 대하여 우리가 인지할 수 있는 특성을 나타낸다면 이 값을 조정하여 이미지를 의도적으로 생성할 수 있을 것이다. 이렇게 이미지의 feature를 우리가 의도한 방식으로 추출하는 것을 disentangling이라고 한다. 이러한 disentangled factors는 이미지의 특성 및 추상화된 개념을 나타내게 된다.

Continue reading

Human-level control through deep reinforcement learning

25 Jan 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

기존의 강화학습 agent들은 각 게임마다 다른 feature를 추출하여 state로 사용하였기 때문에 각 게임마다 다른 모델로 학습해야 한다는 한계가 있었다.

Continue reading

Neural Machine Translation by Jointly Learning to Align and Translate

23 Jan 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존의 encoder-decoder model들은 인코더의 맨 마지막 벡터의 모든 input정보가 담겨야 해서 긴 문장을 번역하는데 한계가 있었다.

Continue reading

GloVe: Global Vectors for Word Representation

08 Jan 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

기존의 Skipgram과 CBOW는 일정 window 내의 정보만 반영할 뿐 global한 frequency정보는 반영하지 못한다.

Continue reading

Distributed Representations of Words and Phrases and their Compositionality

05 Jan 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

최근 제시된 Skipgram의 성능을 향상시킬만한 여러 기법을 시행하였다.

Continue reading

Distributed Representations of Sentences and Documents

04 Jan 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

한 단어의 길이는 정해져 있지만 한 문단이나 문서의 길이는 정해져 있지 않기 때문에 하나의 벡터로 만드는데 어려움이 있다. 이를 극복하기 위해 Paragraph Vector를 제안한다.

Continue reading

Efficient Estimation of Word Representations in Vector Space

04 Jan 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

Continuous Word Representation을 평가할 수 있는 방법을 제시하고
더 많은 데이터들로 부터 더 빠르게 더 나은 표현을 학습할 수 있는 Continous Bag-of-Words(CBOW)와 Continous Skip-gram Model을 제시

Continue reading