Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Semantic parsing

Table of contents

Semantic parsing is the task of translating natural language into a formal meaning
representation on which a machine can act. Representations may be an executable language
such as SQL or more abstract representations such as Abstract Meaning Representation (AMR).

AMR parsing

Each AMR is a single rooted, directed graph. AMRs include PropBank semantic roles, within-sentence coreference, named entities and types, modality, negation, questions, quantities, and so on. See.
In the following tables, systems marked with ♥ are pipeline systems that require POS as input,
♠ is for those require NER,
♦ is for those require syntax parsing,
and ♣ is for those require SRL.

LDC2014T12:

13,051 sentences

Models are evaluated on the newswire section and the full dataset based on smatch.

SELECT CITYalias0.CITY_NAME FROM CITY AS CITYalias0 WHERE CITYalias0.POPULATION = ( SELECT MAX( CITYalias1.POPULATION ) FROM CITY AS CITYalias1 WHERE CITYalias1.STATE_NAME = "arizona" ) AND CITYalias0.STATE_NAME = "arizona"

Spider

Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL
dataset. It consists of 10,181 questions and 5,693 unique complex SQL queries on
200 databases with multiple tables covering 138 different domains. In Spider 1.0,
different complex SQL queries and databases appear in train and test sets.

The Spider dataset can be accessed and leaderboard can be accessed here.

WikiSQL

The WikiSQL dataset consists of 87,673
examples of questions, SQL queries, and database tables built from 26,521 tables.
Train/dev/test splits are provided so that each table is only in one split.
Models are evaluated based on accuracy on execute result matches.

Academic - 196 questions about publications generated by enumerating all of the different queries possible with the Microsoft Academic Search interface, then writing questions for each query Li and Jagadish (2014). Improved and converted to a cononical style by Finegan-Dollak et al., (2018).