• Deep Contextualized Word Representations

    WHY? Former word representations such as Word2Vec or GloVe didn’t contained linguistic context. WHAT? This paper suggests embedding from language model(ELMO) to include context information of word. Assume that x is context independent representation (token embedding or CNN over characters). Bi-directional l layers LSTM are used to predict previous or...


  • Neural Variational Inference and Learning in Belief Networks

    WHY? Directed latent variable models are known to be difficult to train at large scale because posterior distribution is intractable. WHAT? This paper suggests way to estimate inference model with feed-forward network. Since exact posterior is intractable, we use to approximate. Since h is sampled from posterior, it is impossible...


  • Neural Autoregressive Distribution Estimation

    WHY? Estimating the distribution of data can help solving various predictive taskes. Approximately three approaches are available: Directed graphical models, undirected graphical models, and density estimation using autoregressive models and feed-forward neural network (NADE). WHAT? Autoregressive generative model of data consists of D conditionals. NADE use feed-forward neural network to...


  • Glow: Generative Flow with Invertible 1x1 Convolutions

    WHY? The motivation is almost the same as that of NICE, and RealNVP. WHAT? The architecture of generative flow(Glow) is almost the same as multi-scale architecture of RealNVP. A step of flow of Glow uses act-norm instead of batch-norm and uses invertible 1x1 convolution instead of reverse ordering. Act-norm performs...


  • [Pytorch] MUNIT

    Pytorch implementation of Multimodal Unsupervised Image-to-Image Translation. https://github.com/Lyusungwon/munit_pytorch Reference https://github.com/NVlabs/MUNIT Note My impression on this paper differed a lot from the first time when I read this paper. 8+ models took all of my memory, so I had to train with batch size < 4. 8 latent variables vs 256...