• Phase-aware Speech Enhancement with Deep Complex U-net

    WHY? For audio source separation task, traditional approach only utilized magnitude part ignoring phase part. Previously deep complex network provided complex arithmetics via convolution. WHAT? Deep Complex U-net modified simple DCN to better preseve audio information. First, DCU used strided complex-valued convolutional layers instead of max pooling operation. Second, complex...


  • Transformation Autoregressive Networks

    WHY? Autoregressive model has been dominant model for density estimation. On the other hand, various non-linear transformations techniques enabled tracking of density after transformation of variables. Transformation Autoregressive Networks(TAN) combined non-linear transformation into autoregressive model to capture more complicated density of data. WHAT? TAN is composded of two module: autogregressive...


  • Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

    WHY? Two approximation methods, Variational inference and MCMC, have different advantages: usually, variational inference is fast while MCMC is more accurate. Note Markov Chain Monte Carlo (MCMC) is approximation method of estimating a variable. MCMC first sample a random draw and than draw a chain of variables from a stochastic...


  • Neural Process

    WHY? Gaussian process has several advantages. Based on robust statistical assumptions, GP does not require expensive training phase and can represent uncertainty of unobserved areas. However, HP is computationally expensive. Neural process tried to combine the best of Gaussian process and neural network. WHAT? Neural process satisfy two condtions of...


  • Grammar Variational Autoencoder

    WHY? Generative models of discrete data with particular structure (grammar) often result invalid outputs. Grammar Variational Autoencoder(GVAE) forces the decoder of VAE to result only valid outputs. WHAT? Particular structure of data can be formulated with context-free grammars(CFG). Data with defined CFG can be represented as a parse tree. Encoder...