Recurrent Neural Network Grammars

2017/11/4 posted in  NLP comments

Dyer et al. (2016)1 adopted RNNs to do both parsing and language modeling.

RNNs are deemed to be inappropriate models of natural language, since relations between words are in compositional nested structures rather than sequential surface order2.

The authors introduced a new generative probabilistic model of sentences to enable modeling of nested, hierarchical structures in nature language, for RNNs. Parsing operates in bottom-up fashion, while generation makes use of top-down grammar information.

RNNG defines a joint probability distribution over string terminals (words in a language) and phrase-structure nonterminals. It is motivated by the conventional transition system, which is an abstract state machine. But the first big difference is that RNNG is a generative model (although can be modified to discriminative parsing). Formally, the RNNG is defined by a triple \(\langle N,\Sigma,\Theta \rangle\), where \(N\) denotes nonterminal symbols (NP, VP, etc.), \(\Sigma\) denotes terminal symbols (\(N \cap \Sigma = \emptyset\)), and \(\Theta\) denotes model parameters. Regarding implementation, RNNG is consisted of a stack storing partially completed constituents, a buffer storing already-generated terminals, and a list of past actions. It generates sentence \(x\) and its phrase-structure tree \(y\) simultaneously. The actions sequence \(\boldsymbol{a} = \langle a_1,\ldots,a_n \rangle\) to generate \((x, y)\) is called the oracle.


  1. C. Dyer, A. Kuncoro, M. Ballesteros, and N. A. Smith, “Recurrent Neural Network Grammars,” arXiv.org, vol. cs.CL. p. arXiv:1602.07776, 24-Feb-2016. 

  2. Noam Chomsky, Syntactic Structures. The Hague[J]. Mouton and Company, 1957.