Deep Learning Travels
Don't panic

Deep Contextualized Word Representations
WHY? Former word representations such as Word2Vec or GloVe didnâ€™t contained linguistic context. WHAT? This paper suggests embedding from language model(ELMO) to include context information of word. Assume that x is context independent representation (token embedding or CNN over characters). Bidirectional l layers LSTM are used to predict previous or...

Neural Variational Inference and Learning in Belief Networks
WHY? Directed latent variable models are known to be difficult to train at large scale because posterior distribution is intractable. WHAT? This paper suggests way to estimate inference model with feedforward network. Since exact posterior is intractable, we use to approximate. Since h is sampled from posterior, it is impossible...

Neural Autoregressive Distribution Estimation
WHY? Estimating the distribution of data can help solving various predictive taskes. Approximately three approaches are available: Directed graphical models, undirected graphical models, and density estimation using autoregressive models and feedforward neural network (NADE). WHAT? Autoregressive generative model of data consists of D conditionals. NADE use feedforward neural network to...

Glow: Generative Flow with Invertible 1x1 Convolutions
WHY? The motivation is almost the same as that of NICE, and RealNVP. WHAT? The architecture of generative flow(Glow) is almost the same as multiscale architecture of RealNVP. A step of flow of Glow uses actnorm instead of batchnorm and uses invertible 1x1 convolution instead of reverse ordering. Actnorm performs...

[Pytorch] MUNIT
Pytorch implementation of Multimodal Unsupervised ImagetoImage Translation. https://github.com/Lyusungwon/munit_pytorch Reference https://github.com/NVlabs/MUNIT Note My impression on this paper differed a lot from the first time when I read this paper. 8+ models took all of my memory, so I had to train with batch size < 4. 8 latent variables vs 256...