• Scalable Distributed DNN Training Using Commodity GPU Cloud Computing

    WHY? Synchronization is an important issue is distributed SGD. Too few synchoronization among nodes causes unstable training while too frequent synchoronization causes high communication cost. WHAT? This paper tried to reduce the communication cost of distributed SGD drastically by compression. This paper suggests two points. First is that many techniques...


  • Neural Word Embedding as Implicit Matrix Factorization

    WHY? Skip-Gram Negative Sampling(SGNS) showed amazing performance compared to traditional word embedding methods. However, it was not clear where SGNS converge to. WHAT? This paper proved that minimizing the loss function of SGNS is equivalent to factorizing the word-context matrix with association measure of shifted PMI. The loss function of...


  • Dependency-Based Word Embeddings

    WHY? Traditional continuous word embeddings based on linear contexts. In other words, word embeddings considered only surrounding words as context. WHAT? This paper introduces dependency based word embedding to capture more meaningful context. The result of dependency parsing of sentences is used as context instead of surrounding words. So? While...


  • Large Scale Distributed Deep Networks

    WHY? Models with huge number of parameters or huge amount of data do not fit in GPU memory of a machine. WHAT? This paper introduces DistBelief which is a software framework that enable parallel training in cluster settings. There are two kinds of parallelism in training deep-learning model: Model parallelism...


  • Improving Distributional Similarity with Lessons Learned from Wrod Embeddings

    WHY? Word embedding using neural network(Skipgram) seems to outperform traditional count-based distributional model. However, this paper points out that current superiority of word2vec is not because of the algorithm itself, but because of system design choices and hyperparameter optimizations. Note Traditional method of word representation is count-based representation (bag-of-contexts). This...