Hybrid computing using a neural network with dynamic external memory
in Studies on Deep Learning, Deep Learning
WHY?
Using external memory as modern computer enable neural net the use of extensible memory. This paper suggests Differentible Neural Computer(DNC) which is an advanced version of Neural Turing Machine.
WHAT?
Reading and writing in DNC are implemented with differentiable attention mechanism.
The controller of DNC is an variant of LSTM architecture that takes an input vector(x_t
) and a set of read vectors(r_{t-1}^1,...,r_{t-1}^R
) as input(concatenated). Concatenated input and hidden vectors from both previous timestep(h_{t-1}^l
) and from previous layer(h_t^{l-1}
) are concatenated again to be used as input for LSTM to produce next hidden vector(h_t^l
). Hidden vectors from all layers at a timestep are concatenated to emit an output vector(\upsilon_t
) and an interface vector(\xi_t
). The output vector(y_t
) is the sum of \upsilon_t
and read vectors of the current timestep.
v_t = W_y[h_t^1;...;h_t^L]\\
\xi_t = W_{\xi}[h_t^1;...;h_t^L]\\
y_t = \upsilon_t + W_t[r_t^1;...;r_t^R]
THe interface vectors are consists of many vectors that interacts with memory: R read keys(\mathbf{k}_t^{r,i}\in R^W
), read strengths(\beta_t^{r,i}
), write key(\mathbf{k}_t^w\in R^W
), write strength(\beta_t^w
), erase vector(\mathbf{e}_t\in R^W
), write vector(\mathbf{v}_t\in R^W
), R free gates(f_t^i
), the allocation gate(g_t^a
), the write gate(g_t^w
) and R read modes(\mathbf{\pi}_t^i).
\mathbf{\xi}_t = [\mathbf{k}_t^{r,1};...;\mathbf{k}_t^{r,R};\beta_t^{r,1};...;\beta_t^{r,R};\mathbf{k}_t^w;\beta_t^w;\mathbf{e}_t;\mathbf{v}_t;f_t^1;...;f_t^R;g_t^a;g_t^w;\mathbf{\pi}_t^1;...;\mathbf{\pi}_t^R]
Read vectors are computed with read weights on memory. Memory matrix are updated with write weights, write vector and erase vector.
\mathbf{r}_t^i = M_t^T\mathbf{w}_t^{r,i}\\
M_t = M_{t-1}\odot(E-\mathbf{w}^w_t\mathbf{e}_t^T)+\mathbf{w}^w_t\mathbf{v}_t^T
Memory are addressed with content-based addressing and dynamic memory allocation. Contesnt-based addressing is basically the same as attention mechanism. Dynamic memory allocation is designed to clear memory as analogous to free list memory allocation scheme.
So?
DNC showed good result on bAbI task, and Graph tasks.