Dyer et al. (2015)1 added an stack pointer \(TOP\) to a conventional LSTM. The trick is use that \(TOP\) cell as \(h_{t-1}\) and \(c_{t-1}\).
Not only the two standard structures (stack and buffer) in transition-based dependency parsing are implemented via a stack-LSTM, but also a third stack storing history actions are introduced and implemented in the same way. The authors seem to favor stack structure and hope it can encode configurations more thoroughly.
-
C. Dyer, M. Ballesteros, W. Ling, A. Matthews, and N. A. Smith, “Transition-Based Dependency Parsing with Stack Long Short-Term Memory.,” arXiv, vol. 1505, p. arXiv:1505.08075, 2015. ↩