Stack Long Short-Term Memories
Dyer et al. (2015)1 added an stack pointer \(TOP\) to a conventional LSTM. The trick is use that \(TOP\) cell as \(h_{t-1}\) and \(c_{t-1}\).
Not only the two standard structures (stack and buffer) in transition-based dependency parsing are implemented via a stack-LSTM, but also a third stack storing history actions are introduced and implemented in the same way. The authors seem to favor stack structure and hope it can encode configurations more thoroughly.
-
C. Dyer, M. Ballesteros, W. Ling, A. Matthews, and N. A. Smith, “Transition-Based Dependency Parsing with Stack Long Short-Term Memory.,” arXiv, vol. 1505, p. arXiv:1505.08075, 2015. ↩
Recurrent Neural Network Grammars
Dyer et al. (2016)1 adopted RNNs to do both parsing and language modeling.
RNNs are deemed to be inappropriate models of natural language, since relations between words are in compositional nested structures rather than sequential surface order2.
The authors introduced a new generative probabilistic model of sentences to enable modeling of nested, hierarchical structures in nature language, for RNNs. Parsing operates in bottom-up fashion, while generation makes use of top-down grammar information.
RNNG defines a joint probability distribution over string terminals (words in a language) and phrase-structure nonterminals. It is motivated by the conventional transition system, which is an abstract state machine. But the first big difference is that RNNG is a generative model (although can be modified to discriminative parsing). Formally, the RNNG is defined by a triple \(\langle N,\Sigma,\Theta \rangle\), where \(N\) denotes nonterminal symbols (NP, VP, etc.), \(\Sigma\) denotes terminal symbols (\(N \cap \Sigma = \emptyset\)), and \(\Theta\) denotes model parameters. Regarding implementation, RNNG is consisted of a stack storing partially completed constituents, a buffer storing already-generated terminals, and a list of past actions. It generates sentence \(x\) and its phrase-structure tree \(y\) simultaneously. The actions sequence \(\boldsymbol{a} = \langle a_1,\ldots,a_n \rangle\) to generate \((x, y)\) is called the oracle.
Copyright © Hankcs