All the previous neural machine translators are based on word-level translation. Word-level translators has critical problem of out-of-vocabulary error.


This paper suggest Bi-Scale recurrent neural network with attention to model character level translator. Input are encoded into BPE. image Slower layer carries information of word and faster layer carries information of character. Architecturally, slower layer can update only faster layer finish a word, resulting slower layer updates slower. is faster layer and is slower layer. image image image image image image


image CL nmt showed better result than traditional non-neural translator and showed attention functioning properly.


One attempt to make rnn capture hierarchical structure. Result seems quite disappointing.

Chung, Junyoung, Kyunghyun Cho, and Yoshua Bengio. “A character-level decoder without explicit segmentation for neural machine translation.” arXiv preprint arXiv:1603.06147 (2016).