Abstract

Inferencing the entailment relations between natural language sentence pairs is fundamental to artificial intelligence. Recently, there is a rising interest in modeling the task with neural attentive models. However, those existing models have a major limitation to keep track of the attention history because usually only one single vector is utilized to memorize the past attention information. We argue its importance based on our observation that the potential alignment clues are not always centralized. Instead, they may diverge substantially, which could cause the problem of long-range dependency. In this paper, we propose to facilitate the conventional attentive reading operations with two sophisticated writing operations - forget and update. Instead of utilizing a single vector that accommodates the attention history, we write the past attention information directly into the sentence representations. Therefore, higher memory capacity of attention history could be achieved. Experiments on Stanford Natural Language Inference corpus (SNLI) demonstrate the superior efficacy of our proposed architecture.