Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

2018/2/15 posted in  NLP comments

This paper reminds me of a funny idea several years ago: use CRF tagger to do dependency parsing. Does it work? Well, the author over-claimed a high score I've never replicated. It doesn't sound reasonable to cast a dependency tree into sequence of tags.


In this paper, they also did a cast:

Strait-forward, but can't handle nested cases.

Bias Objective Function

The only interesting point is the loss function:

L=\max\sum_{j=1}^{\vert \mathcal{D}\vert}\sum_{t=1}^{L_j}\left(\log(p_t^{(j)}=y_t^{(j)}\vert x_j,\Theta) \cdot I(O) + \alpha\cdot\log(p_t^{(j)}=y_t^{(j)}\vert x_j,\Theta) \cdot \left(1-I(O)\right)\right)

where \(\vert \mathcal{D}\vert\) is the training set size, \(L_j\) is length of sentence \(x_j\), \(p\) and \(y\) are the prediction and gold labels, \(I(O)\) is a indicator function which outputs \(1\) only if the \(y=O\), \(\alpha\) is the bias weight, controls how important the non-O tag is. In their experiment, they set \(\alpha=10\).