Generative models of discrete data with particular structure (grammar) often result invalid outputs. Grammar Variational Autoencoder(GVAE) forces the decoder of VAE to result only valid outputs.


Particular structure of data can be formulated with context-free grammars(CFG). Data with defined CFG can be represented as a parse tree.

image Encoder uses the knowledge of rules explicitly. 1) Explicit rules of data are defined. 2) Data can be represented as parse tree and rules in data can be extracted. 3) Rules can be converted into 1-hot vectors. 4) Data represented with 1-hot vectors of rules can be map to latent space with convolution neural network.

image Decoder is a recurrent neural network to output sequences of discrete tokens. 1) Decoder sequentially output logits of grammar. 2) A stack(last-in first-out) is used to keep track of parsing of data (starts with start token). 3) Following the order of stack, put masks on the logit with unmasked part strictly follow the grammar. Masking process ensures decoder to only result gramatically valid output. 4) Reconstruct data from parse tree. Detailed algorithm is as follows.

image image Training phase is the same as that of variational autoencoder with described encoder and decoder.


The experiment of GVAE was performed on data of arithmetic expressions and that of molecules. Not only GVAE results valid outputs, it captured more meaningful latent space than character variational autoencoder(CVAE).


Seems naive but inspirational. Some data have strict rules to follow, and there’s no reason not to use them. Rule2vec?

Kusner, Matt J., Brooks Paige, and José Miguel Hernández-Lobato. “Grammar variational autoencoder.” arXiv preprint arXiv:1703.01925 (2017).