Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

This paper reminds me of a funny idea several years ago: use CRF tagger to do dependency parsing. Does it work? Well, the author over-claimed a high score I've never replicated. It doesn't sound reasonable to cast a dependency tree into sequence of tags.

Cast

In this paper, they also did a cast:
2018-02-15_20-14-02

Strait-forward, but can't handle nested cases.

Bias Objective Function

The only interesting point is the loss function:

\[
L=\max\sum_{j=1}^{\vert \mathcal{D}\vert}\sum_{t=1}^{L_j}\left(\log(p_t^{(j)}=y_t^{(j)}\vert x_j,\Theta) \cdot I(O) + \alpha\cdot\log(p_t^{(j)}=y_t^{(j)}\vert x_j,\Theta) \cdot \left(1-I(O)\right)\right)
\]

where \(\vert \mathcal{D}\vert\) is the training set size, \(L_j\) is length of sentence \(x_j\), \(p\) and \(y\) are the prediction and gold labels, \(I(O)\) is a indicator function which outputs \(1\) only if the \(y=O\), \(\alpha\) is the bias weight, controls how important the non-O tag is. In their experiment, they set \(\alpha=10\).

2018/2/15 posted in  NLP

Stack Long Short-Term Memories

Dyer et al. (2015)1 added an stack pointer \(TOP\) to a conventional LSTM. The trick is use that \(TOP\) cell as \(h_{t-1}\) and \(c_{t-1}\).

Not only the two standard structures (stack and buffer) in transition-based dependency parsing are implemented via a stack-LSTM, but also a third stack storing history actions are introduced and implemented in the same way. The authors seem to favor stack structure and hope it can encode configurations more thoroughly.


  1. C. Dyer, M. Ballesteros, W. Ling, A. Matthews, and N. A. Smith, “Transition-Based Dependency Parsing with Stack Long Short-Term Memory.,” arXiv, vol. 1505, p. arXiv:1505.08075, 2015. 

2017/11/22 posted in  NLP

Recurrent Neural Network Grammars

Dyer et al. (2016)1 adopted RNNs to do both parsing and language modeling.

RNNs are deemed to be inappropriate models of natural language, since relations between words are in compositional nested structures rather than sequential surface order2.

The authors introduced a new generative probabilistic model of sentences to enable modeling of nested, hierarchical structures in nature language, for RNNs. Parsing operates in bottom-up fashion, while generation makes use of top-down grammar information.

RNNG defines a joint probability distribution over string terminals (words in a language) and phrase-structure nonterminals. It is motivated by the conventional transition system, which is an abstract state machine. But the first big difference is that RNNG is a generative model (although can be modified to discriminative parsing). Formally, the RNNG is defined by a triple \(\langle N,\Sigma,\Theta \rangle\), where \(N\) denotes nonterminal symbols (NP, VP, etc.), \(\Sigma\) denotes terminal symbols (\(N \cap \Sigma = \emptyset\)), and \(\Theta\) denotes model parameters. Regarding implementation, RNNG is consisted of a stack storing partially completed constituents, a buffer storing already-generated terminals, and a list of past actions. It generates sentence \(x\) and its phrase-structure tree \(y\) simultaneously. The actions sequence \(\boldsymbol{a} = \langle a_1,\ldots,a_n \rangle\) to generate \((x, y)\) is called the oracle.


  1. C. Dyer, A. Kuncoro, M. Ballesteros, and N. A. Smith, “Recurrent Neural Network Grammars,” arXiv.org, vol. cs.CL. p. arXiv:1602.07776, 24-Feb-2016. 

  2. Noam Chomsky, Syntactic Structures. The Hague[J]. Mouton and Company, 1957. 

2017/11/4 posted in  NLP