Deterministic Policy Gradient Algorithms

16 Jul 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Policy gradient usually requires integral over all the possible actions.

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

16 Jul 2018 in Studies on Deep Learning, Deep Learning

WHY?

Reparameterization trick is a useful technique for estimating gradient for loss function with stochastic variables. While score function extimators suffer from great variance, RT enable the gradient to be estimated with pathwise derivatives. Even though reparameterization trick can be applied to various kinds of random variables enabling backpropagation, it has not been applicable to discrete random variables.

How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift

11 Jul 2018 in Studies on Deep Learning, Deep Learning

WHY?

While the effect of batch normalization was widely proven empirically, the exact mechanism of it is yet been understood. Commonly known explanation for this was internal covariance shift(ICS) meaning the change in the distribution of layer inputs caused by updates to the preceeding layers.

World Models

10 Jul 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Instead of instantly responding to incoming stimulus, having a model of environment to make some level of prediction would help perform in reinforcement learning.

Gradient Estimation Using Stochastic Computation Graphs

09 Jul 2018 in Studies on Statistics, Probabilistic Graphical Modeling

WHY?

Many machine learning problems involves loss function that contains random variables. To perform backpropagation, estimating gradient of the loss function is required.

Amortized Inference in Probabilistic Reasoning

09 Jul 2018 in Studies on Statistics, Probabilistic Graphical Modeling

WHY?

Former studies on probabilistic reasoning assume that reasoning is memoryless, which means all the inference occur independently without the reuse of previous computation.

Vanilla Sky (2001)

08 Jul 2018 in Writings on Reviews

평점: 4

Distributed Prioritized Experience Replay

07 Jul 2018 in Studies on Deep Learning, Reinforcement Learning

WHY?

Gorila framework separated several actors and learners with a centralized parameter server to parrallelize the learning process. This framework required one GPU per learner.

WHY?

WHY?

WHY?

WHY?

WHY?

WHY?

WHY?

Pagination