# Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

2018/2/15 posted in  NLP

This paper reminds me of a funny idea several years ago: use CRF tagger to do dependency parsing. Does it work? Well, the author over-claimed a high score I've never replicated. It doesn't sound reasonable to cast a dependency tree into sequence of tags.

## Cast

In this paper, they also did a cast:

Strait-forward, but can't handle nested cases.

## Bias Objective Function

The only interesting point is the loss function:

$L=\max\sum_{j=1}^{\vert \mathcal{D}\vert}\sum_{t=1}^{L_j}\left(\log(p_t^{(j)}=y_t^{(j)}\vert x_j,\Theta) \cdot I(O) + \alpha\cdot\log(p_t^{(j)}=y_t^{(j)}\vert x_j,\Theta) \cdot \left(1-I(O)\right)\right)$

where $$\vert \mathcal{D}\vert$$ is the training set size, $$L_j$$ is length of sentence $$x_j$$, $$p$$ and $$y$$ are the prediction and gold labels, $$I(O)$$ is a indicator function which outputs $$1$$ only if the $$y=O$$, $$\alpha$$ is the bias weight, controls how important the non-O tag is. In their experiment, they set $$\alpha=10$$.