Some work I liked at ACL 2017

I was fortunate to attend ACL 2017 last week in Vancouver. There was a lot of great work on a variety of topics, and I'll quickly mention some papers/talks/posters/tutorials that I liked below (in no particular order—there's far more on my reading list).

Character-level neural machine translation is attractive because it is completely open-vocabulary and thus you don't get issues with out-of-vocabulary tokens. However, it's very expensive to train (Luong and Manning 2016 reported a train time of 3 months!) because attention is \( O(n^2) \) in the sequence length and representing sentences as a sequence of characters naturally blows up the sequence length. In addition, the increased sequence length makes it harder for the LSTM to handle long-term dependencies.

To overcome this limitation, the authors compress the input sequence with a stack of convolutional neural networks, pooling layers, and highway networks before finally encoding it with an RNN. They show that their character-to-character model outperforms a competitive byte-pair-encoding (BPE) to BPE model.

In addition, they do some experiments with multilingual MT where they train a single character-to-character model to translate German, Czech, Finnish, and Russian to English. This model is able to handle code-switching seamlessly (multiple languages in a sentence), and they show that models trained in this multilingual fashion overfit less than bilingual models on low-resource language pairs.

This was a paper/talk without any performance numbers, which was quite refreshing. The paper is a good survey of what's been happening in the field of semantic representation, and it compares the various semantic representation schemes that have cropped up recently (e.g. UCCA, AMR, etc.) and posits future directions for research in the area.

As someone who has zero experience with semantic representations but is generally quite interested with the problem of computational encoding meaning in language, I thought that this paper was an interesting read. In particular, I liked the discussion of the role of syntax in semantics and how that should play a role in semantic representations.

An illustration of the cache-augmented hierarchical language model (Kawakami et al. 2017). The sentence is also a good illustration of the "bursty" nature with which proper nouns can occur in a corpus.

I liked the core idea behind this paper, which was that when rare words occur in language modeling, they're far more likely to occur again in the near future. To this end, they augment a hierarchical LSTM model with a memory that stores recently created words. At each timestep, the model decides whether to copy from the cache or generate a sequence of characters.

They show that the model does learn to use the cache for generating proper names, which are typically rare words that occur in sentence or paragraph-level clusters (instead across the corpus). In addition, they build a language modeling dataset across seven typologically diverse languages and show that their model outperforms a standard LSTM and a cache-less hierarchical LSTM across all languages.

Distributional semantics is a very active area of research, especially due to the rise of deep learning in NLP. Although I didn't go to this tutorial, I've always been interested in methods for composition of semantic vectors. Since language is inherently compositional, it seems desirable that semantic vectors should be as well. These slides look fantastic for getting an overview of what's happening.

General Conference Experience

Perhaps because of the huge amount of UW researchers attending ACL, I felt much less socially awkward than usual—it was pretty easy to find a conversation to join since the amount of UW attendees meant that I, fortunately, knew a lot of people at the conference. I got to meet and have good conversations with new people (e.g. Jesse Thomason, Joonsuk (John) Park, Alan Ritter, Sasha Rush), and was able to catch up with some old acquaintances (Vlad Niculae, Jon Gauthier, and others).

If you have any recommendations for nice papers to read, I'd love to hear about them!