Alexa Prize Proceedings

The Alexa Prize Proceedings publishes the research in Conversational AI resulting from the pursuit of the Alexa Prize competition goals. Amazon works closely with university teams to provide a testbed for research to address the challenges with Dialog Management, Natural Language Understanding (NLU), Contextual Modeling, Commonsense Reasoning and Response Generation, and these proceedings seek to capture the advances in those areas that result from these efforts. Authors are free to make additional hardcopy publishing arrangements.

AbstractBuilding open domain conversational systems that allow users to have engaging conversations on topics of their choice is a challenging task. Alexa Prize was launched in 2016 to tackle the problem of achieving natural, sustained, coherent and engaging open-domain dialogs. In the second iteration of the competition in 2018, university teams advanced the state of the art by using context in dialog models, leveraging knowledge graphs for language understanding, handling complex utterances, building statistical and hierarchical dialog managers, and leveraging model-driven signals from user responses. The 2018 competition also included the provision of a suite of tools and models to the competitors including the CoBot (conversational bot) toolkit, topic and dialog act detection models, conversation evaluators, and a sensitive content detection model so that the competing teams could focus on building knowledge-rich, coherent and engaging multi-turn dialog systems. This paper outlines the advances developed by the university teams as well as the Alexa Prize team to achieve the common goal of advancing the science of Conversational AI. We address several key open-ended problems such as conversational speech recognition, open domain natural language understanding, commonsense reasoning, statistical dialog management and dialog evaluation. These collaborative efforts have driven improved experiences by Alexa users to an average rating of 3.61, median duration of 2 mins 18 seconds, and average turns to 14.6, increases of 14%, 92%, 54% respectively since the launch of the 2018 competition. For conversational speech recognition, we have improved our relative Word Error Rate by 55% and our relative Entity Error Rate by 34% since the launch of the Alexa Prize. Socialbots improved in quality significantly more rapidly in 2018, in part due to the release of the CoBot toolkit, with new entrants attaining an average rating of 3.35 just 1 week into the semifinals, compared to 9 weeks in the 2017 competition.

AbstractGunrock is a social bot designed to engage users in open domain conversations. We improved our bot iteratively using large scale user interaction data to be more capable and human-like. Our system engaged in over 40,000 conversations during the semi-finals period of the 2018 Alexa Prize. We developed a context-aware hierarchical dialog manager to handle a wide variety of user behaviors, such as topic switching and question answering. In addition, we designed a robust threestep natural language understanding module, which includes techniques such as sentence segmentation and automatic speech recognition (ASR) error correction. Furthermore, we improve the human-likeness of the system by adding prosodic speech synthesis. As a result of our many contributions and large scale user interactions analysis, we achieved an average score of 3.62 on a 1 - 5 Likert scale on Oct 14th. Additionally, we achieved an average of 22.14 number of turns and a 5.22 minutes conversation duration.

AbstractThis paper presents the second version of the dialogue system named Alquist competing in Amazon Alexa Prize 2018. We introduce a system leveraging ontology based topic structure called topic nodes. Each of the nodes consists of several sub-dialogues, and each sub-dialogue has its own LSTM-based model for dialogue management. The sub-dialogues can be triggered according to the topic hierarchy or a user intent which allows the bot to create a unique experience during each session.

AbstractWe describe our 2018 Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems. This paper reports on the version of the system developed and evaluated in the semifinals of the 2018 competition (i.e. up to 15 August 2018), but not on subsequent enhancements. The main advances over our 2017 Alana system are: (1) a deeper Natural Language Understanding (NLU) pipeline; (2) the use of topic ontologies and Named Entity Linking to enable the user to navigate and search through a web of related information; rendering Alana in part an interactive NL interface to linked information on the web; (3) system generated clarification questions to interactively disambiguate between Named Entities as part of NLU; (4) a new profanity & abuse detection model with rule-based mitigation strategies; and (5) response retrieval from Reddit. We also present several ablation studies that measure the performance contributions of specific features (e.g. use of Ontology-bot, Reddit-bot, rule-based systems, etc). We find that these features increase overall system performance. Our final score, namely averaged user ratings over the whole semi-finals period, was 3.4. We were also able to achieve long dialogues (average around 11 turns and 2.20 minutes) during the semi-finals period.

AbstractWe present BYU-EVE, an open domain dialogue architecture that combines the strengths of hand-crafted rules, deep learning, and structured knowledge graph traversal in order to create satisfying user experiences. Rather than viewing dialogue as a strict mapping between input and output texts, EVE treats conversations as a collaborative process in which two jointly coordinating agents chart a trajectory through experiential space. A key element of this architecture is the use of conversational scaffolding, a technique which uses a (small) conversational dataset to define a generalized response strategy. We also take the innovative approach of integrating the agent’s self and user models directly within the knowledge graph. This allows EVE to discern topics of shared interest while simultaneously identifying areas of ambiguity or cognitive dissonance.

AbstractThis paper describes the Tartan conversational agent built for the 2018 Alexa Prize Competition. Tartan is a non-goal-oriented socialbot focused around providing users with an engaging and fluent casual conversation. Tartan’s key features include an emphasis on structured conversation based on flexible finite-state models and an approach focused on understanding and using conversational acts. To provide engaging conversations, Tartan blends script-like yet dynamic responses with data-based generative and retrieval models. Unique to Tartan is that our dialog manager is modeled as a dynamic Finite State Machine. To our knowledge, no other conversational agent implementation has followed this specific structure.

AbstractWe describe IrisBot, a conversational agent that aims to help a customer be informed about the world around them, while being entertained and engaged. Our bot attempts to incorporate real-time search, informed advice, and latest news recommendation into a coherent conversation. IrisBot can already track information on the latest topics and opinions from News, Sports, and Entertainment and some specialized domains. The key technical innovations of IrisBot are novel algorithms for contextualized classification of the topic and intent of the user’s utterances, modular ranking of potential responses, and personalized topic suggestions. Our preliminary experimental results based on overall customer experience ratings and A/B testing analysis, focus on understanding the contribution of both algorithmic and surface presentation features. We also suggest promising directions for continued research, primarily focusing on increasing the coverage of topics for in-depth domain understanding, further personalizing the conversation experience, and making the conversation interesting and novel for returning customers.

AbstractIn this paper we present Fantom, a social chatbot competing in the Amazon Alexa Prize 2018. The system uses a dialog graph for retrieving an approximation of the current dialog context in order to find suitable response candidates in this context. The graph is gradually built using user utterances from actual interactions, and system responses suggested by crowd workers. To this end, we developed an automatic system for finding dialog contexts that were often visited but lacked system responses in order to automatically post tasks on Amazon Mechanical Turk. Workers could see a brief excerpt of past conversation history and were asked to suggest a good response, based on a description of the system’s persona and a set of rules that would help foster more engaging conversations. Our main contributions are 1) describing the use of a graph-based approach for context modeling, 2) techniques used in order to make the crowd workers author good content, and 3) discussion of learning outcomes from the Alexa Prize challenge.

AbstractOne of the most interesting aspects of the Amazon Alexa Prize competition is that the framing of the competition requires the development of new computational models of dialogue and its structure. Traditional computational models of dialogueare of two types: (1) task-oriented dialogue, supported by AI planning models,or simplified planning models consisting of frames with slots to be filled; or (2)search-oriented dialogue where every user turn is treated as a search query thatmay elaborate and extend current search results. Alexa Prize dialogue systems such as SlugBot must support conversational capabilities that go beyond what these traditional models can do. Moreover, while traditional dialogue systems rely ontheoretical computational models, there are no existing computational theories that circumscribe the expected system and user behaviors in the intended conversational genre of the Alexa Prize Bots. This paper describes how UCSC’s SlugBot team hascombined the development of a novel computational theoretical model, Discourse Relation Dialogue Model, with its implementation in a modular system in order to test and refine it. We highlight how our novel dialogue model has led us to create anovel ontological resource, UniSlug, and how the structure of UniSlug determines how we curate and structure content so that our dialogue manager implements and tests our novel computational dialogue model.