## WHY?

Hierarchical recurrent encoder-decoder model(HRED) that aims to capture hierarchical structure of sequential data tends to fail because model is encouraged to capture only local structure and LSTM often has vanishing gradient effect.

## WHAT?

Latent Variable Hierarchical Recurrent Encoder-Decoder(VHRED) tried to improve HRED by forcing to learn z with variational inference. Generative process output z from previous w, and inference process infer z through next w.

## So?

VHRED tends to perform better in Long context than LSTM and HRED.

## Critic

It is good to implement variational inference in sequential data but I’m not sure this significantly improved performance. This model can be used to make a good paragraph vector.

Serban, Iulian Vlad, et al. “A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues.” AAAI. 2017.