Deep Learning Travels
Don't panic

Categorical Reparameterization with GunbelSoftmax
WHY? The same motivation with Concrete. WHAT? GumbelSoftmax distribution is the same as Concrete distribution. GS distribution appoaches to onhot as temperature goes 0. However, GS samples are not exactly the same as categorical samples resulting bias. This GS estimator becomes close to unbiased but the variance of gradient increase...

Deterministic Policy Gradient Algorithms
WHY? Policy gradient usually requires integral over all the possible actions. WHAT? The purpose of reinforcement learning is to learn the policy to maximize the objective function. Policy gradient directly train the policy network to minimize the objective function. Stochastic Policy Gradient Since this assumes stochastic policy, this is called...

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
WHY? Reparameterization trick is a useful technique for estimating gradient for loss function with stochastic variables. While score function extimators suffer from great variance, RT enable the gradient to be estimated with pathwise derivatives. Even though reparameterization trick can be applied to various kinds of random variables enabling backpropagation, it...

How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift
WHY? While the effect of batch normalization was widely proven empirically, the exact mechanism of it is yet been understood. Commonly known explanation for this was internal covariance shift(ICS) meaning the change in the distribution of layer inputs caused by updates to the preceeding layers. WHAT? Critic So? Ha, David,...

World Models
WHY? Instead of instantly responding to incoming stimulus, having a model of environment to make some level of prediction would help perform in reinforcement learning. WHAT? Agent model of this paper consists of three parts: Vision(V), Memory(M), and Controller(C). Since simulating the whole pixels of environment is inefficient, VAE model...