Abstract

A new parser and generator for the DELPH-IN joint reference formalism

The Common Language Infrastructure (CLI, ECMA-335) is a modern standard for architecting extensible, platform-independent software. Well-known implementations include Mono and Microsoft's .NET Framework. These managed runtime environments enjoy the robust support of actively developed software, and incorporate decades of research and best practice experience in systems architecture and developer productivity. An aim of this project is to explore the suitability of this platform for a new suite of tools for processing DELPH-IN style TDL (Krieger and Schäfer 1994) grammars.

Single-core performance having reached physical limits, the focus in high-performance computing now concerns multi-core processors. Accordingly, another goal of this project is to examine opportunities for concurrent programming in the processing of precision analytical grammars. This effort has led to the development of a low-lock, concurrent parse/generate chart which exploits new deep operating system support for scalable, fine-grained concurrency.

TFS Representation

Informed by Pereira (1985), Wroblewski (1987), Tomabechi (1991), and more recent work by van Lohuizen, a research goal is to investigate the efficiency--under the intensive demands that are characteristic of unification grammars--of array TFS storage, a representation that departs from the traditional allocation-per-node DAG approach. This internal DAG representation may minimize garbage collector activity in managed programming environments.

Capitalizing on the observation that, in DELPH-IN grammars, the appropriate features for every type are invariant, this approach stores each TFS's nodes contiguously, indexed according to a hash that incorporates its hosting feature. Combined with careful application of C# value types, this approach has demonstrated excellent parsing and generation performance in the .NET managed runtime environment.

Project Status

The system supports both parsing and generation. It has been tested with the English Resource Grammar (Flickinger 2002), Jacy grammar of Japanese (Siegel and Bender 2002), and other Matrix grammars (Bender et al. 2002), notably a medium-small grammar of Thai (unpublished). Exact derivations have been validated versus PET for Redwoods (Oepen et al. 2000) corpora.

As the complexity of the agree tool suite has increased in support of diverse analysis tasks, application maintainability, configuration, and management have become key issues. In response, in 2012 the system was rearchitected as a set of loosely-coupled functors with which the linguist-user composes ad-hoc linguistic processing pipelines via XAML markup. Binding together functors yields a customized, reusable composite functor which, when applied to an input, produces a complex and/or nested sequence of live monads (non-reusable workers). Each monad represents a concurrent push-based processing stream implementing some part of the fanout or fork/join signature required by the originally composed task.

Concurrent Parse-Generate Chart

Single unified chart is shared between parser and generator, abstracting only the difference in edge proximity condition

Capitalizing on new OS support for lightweight tasks scheduled with sophisticated hill-climbing, work-stealing, and load-balancing, a new, non-blocking unification chart parser works by constructing a graph of fully asynchronous fine-grained match/unify tasks.

Rule pre-filter (Kiefer et al. 1999)

Additionally, the following grammar-opt-in techniques developed in the DELPH-IN community and elsewhere: KEY-daughter first, Quick-check (Malouf et al. 2000), spanning-only rules, and daughter ARGS pruning.

Direct daughter unify with skeleton completion: parts of the rule mother TFS which are outside of her rule daughter's coreference extent are only built upon successful daughter unification.

Ambiguity packing with exhaustive unpacking (Oepen and Carroll, 2000)

Maxent Parse Selection

Read (e.g.) redwoods.mem model

Score n-best derivations without unpacking

future work: selective unpacking

MRS

In anticipation of convenient MRS re-writing and SEM-I transfer, a rich MRS suite of interconnected C# objects has been developed.

This MRS suite is extracted from the higher-performance parse/generation representation.

As with most CLR software, the same executable binary should run on any combination of 32 or 64-bit Mono or .NET

WPF Client Application

Display and interact with feature structures in a style inspired by LUI

Syntax tree display

Parse chart display

Note: WPF support will not be available on Mono

The project is also investigating novel visualization technologies for grammar engineering and pedagogical use. Shown below is a three-dimensional view of an authored constraint definition superimposed over its expanded feature structure. Because the structures are aligned, there are spaces in the definition to account for constraints supplied by other structures. Multi-layer views can show the contributions from each definition which make a feature structure well-formed.

The view can be manipulated on any axis to freely explore relationships amongst the authored rules. Although the WPF environment makes such renderings easy to implement, an ongoing challenge is to find a visual presentation with the simplicity and elegance requisite for truly facilitating linguistic insight.

Bernd Kiefer, Hans-Ulrich Krieger, John Carroll, and Rob Malouf. 1999. A Bag of Useful Techniques for Efficient and robust Parsing. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics. 473-480

Marcel P. van Lohuizen. 2001. A generic approach to parallel chart parsing with an application to LinGO. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, Toulouse, France.