Deep Learning Travels
Don't panic

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
WHY? Gradient descent methods depend on the first order gradient of a loss function wrt parameters. However, the second order gradient(Hessian) is often neglected. WHAT? This paper explored exact Hessian prodect of neural network (after convergence) and discovered that the eigenvalue of Hessian is separated into two groups: 0s and...

Variational Inference for Monte Carlo Objectives
WHY? Recent variational training requires sampling of the variational posterior to estimate gradient. NVIL estimator suggest a method to estimate the gradient of the loss function wrt parameters. Since score function estimator is known to have high variance, baseline is used as variance reduction technique. However, this technique is insufficient...

[Pytorch] MADE
Pytorch implementation of MADE: Masked Autoencoder for Distribution Estimation. https://github.com/Lyusungwon/generative_models_pytorch Reference https://github.com/karpathy/pytorchmade Note Autoregressive sampling was tricky Results Config model: 180817182411_made_1000_200_0.001_28_28_1000_2_1_False epochs 1000 batchsize 200 lr 1e3 hiddensize 1000 layersize 2 masknum 1 startsample 394 randomorder False Test loss Samples Original  Reconstruction  Inpainting input  Inpainting output

Noisy Network for Exploration
WHY? Efficient exploration of agent in reinforcement learning is an important issue. Conventional exploration heuristics includes greedy for DQN and entropy reward for A3C. WHAT? NoisyNet is a neural network whose parameters are replaced with a parametric function of the noise. There are two options for noise: Independent Gaussian noise...

Unsupervised Deep Embedding for Clustering Analysis
WHY? There had been little study on learning representation that focus on clustering. WHAT? Deep Embedding Clustering(DEC) consists of two phases: parameter initialization with a deep autoencoder and (2) parameter optimization. This paper first describe the second phase. Assume encoder and inital cluster centroids are given, two steps are alternated...