Directed latent variable models are known to be difficult to train at large scale because posterior distribution is intractable.


This paper suggests way to estimate inference model with feed-forward network. Since exact posterior is intractable, we use to approximate. Since h is sampled from posterior, it is impossible to get exact gradient of lowerbound wrt parameters. Therefore, Monte-Carlo estimation is used, and score function estimator (REINFORCE) is used to get gradient of lowerbound which include stochastic variable h. However, usually the variance of estimation of gradient using score function estimator is high. So variance reduction technique is used to estimate the gradient. c is global baseline that is learned through training, and is used to input-dependent baseline. Input-dependent baseline is also trained to minimize the mse. To make training stable, variance is normalized with running estimate when it is greater than 1. If inference network is structured, we can estimate local learning signal for each factorized conditionals. Then, layer-dependent baseline need to be learned.


NVIL used in sigmoid belief network (SBN) outperformed SBN using wake-sleep algorithm and other models including DARN, NADE, RBM and MoB in NLL for MNIST. SBN using NVIL showed better performance in document modeling then LDA.


This seems smart move, but reparameterization of VAE was too strong. This can be used in cases where distribution is impossible to reparamterize.

Mnih, Andriy, and Karol Gregor. “Neural variational inference and learning in belief networks.” arXiv preprint arXiv:1402.0030 (2014).