• Density Estimation Using Real NVP

    WHY? The motivation is almost the same as that of NICE. This papaer suggest more elaborate transformation to represent complex data. WHAT? NICE suggested coupling layers with tractable Jacobian matrix. This paper suggest flexible bijective function while keeping the property of coupling layers. Affine coupling layers scale and translate the...


  • NICE: Non-linear Independent Components Estimation

    WHY? Modeling data with known probability distribution has a lot of advantages. We can exactly calculate the log likelihood of the data and easily sample new data from distribution. However, finding tractable transformation of data into probability distribution or vice versa is difficult. For instance, a neural encoder is a...


  • Categorical Reparameterization with Gunbel-Softmax

    WHY? The same motivation with Concrete. WHAT? Gumbel-Softmax distribution is the same as Concrete distribution. GS distribution appoaches to on-hot as temperature goes 0. However, GS samples are not exactly the same as categorical samples resulting bias. This GS estimator becomes close to unbiased but the variance of gradient increase...


  • Deterministic Policy Gradient Algorithms

    WHY? Policy gradient usually requires integral over all the possible actions. WHAT? The purpose of reinforcement learning is to learn the policy to maximize the objective function. Policy gradient directly train the policy network to minimize the objective function. Stochastic Policy Gradient Since this assumes stochastic policy, this is called...


  • The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

    WHY? Reparameterization trick is a useful technique for estimating gradient for loss function with stochastic variables. While score function extimators suffer from great variance, RT enable the gradient to be estimated with pathwise derivatives. Even though reparameterization trick can be applied to various kinds of random variables enabling backpropagation, it...