Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

09 Aug 2018 in Studies on Deep Learning, Natural Language Processing

WHY?

This paper first prove that the expresiveness of a language model is restricted by softmax and suggest a way to overcome this limit.

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

08 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

Gradient descent methods depend on the first order gradient of a loss function wrt parameters. However, the second order gradient(Hessian) is often neglected.

Variational Inference for Monte Carlo Objectives

07 Aug 2018 in Studies on Deep Learning, Generative Models

WHY?

Recent variational training requires sampling of the variational posterior to estimate gradient. NVIL estimator suggest a method to estimate the gradient of the loss function wrt parameters. Since score function estimator is known to have high variance, baseline is used as variance reduction technique. However, this technique is insufficient to reduce variance in multi-sample setting as in IWAE.

Noisy Network for Exploration

06 Aug 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Efficient exploration of agent in reinforcement learning is an important issue. Conventional exploration heuristics includes $\epsilon$ -greedy for DQN and entropy reward for A3C.

Unsupervised Deep Embedding for Clustering Analysis

02 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

There had been little study on learning representation that focus on clustering.

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

02 Aug 2018 in Studies on Deep Learning, Deep Learning

WHY?

Batch normalization is known as a good method to stablize the optimization of neural network by reducing internal covariate shift. However, batch normalization inheritantly depends on minibatch which impeding the use in recurrent models.

Spectral Normalization for Generative Adversarial Networks

31 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

The largest drawback of training Generative Adversarial Network (GAN) is its instability. Especially, the power of discriminator greatly affect the performance of GAN. This paper suggests to weaken the discriminator by restricting the functional space of it to stablize the training.

A Two-Step Disentanglement Method

27 Jul 2018 in Studies on Deep Learning, Generative Models

WHY?

This paper wanted to disentangle the label related and label unrelated information from data. The model of this paper is simpler and more effective than that of Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

WHY?

WHY?

WHY?

WHY?

WHY?

WHY?

WHY?

WHY?

Pagination