Homepage of Jurgen J. Vinju

Software Complexity Trumps Correctness, Security, and Flexibility

Opinion Alert Complex software is unmanageable. For such software other quality aspects are not diagnosable or even observable and therefore unmanageable. How can you prove or test for correctness if you can not define what it is what needs to be correct? How do you know the software is secure if you are not sure you know about every possible interaction? How do you change a piece of software if you can not estimate the impact? Defining, studying, modeling, measuring, preventing, controlling and mitigating software complexity is the top priority in software engineering. Before that, anything else is interesting but less relevant. See also this discussion in ERCIM news.

I am a researcher in the field of Software Engineering. My academic position is group leader of SWAT - Software Analysis & Transformation at CWI, and group leader of ATEAMS at INRIA Lille - Nord Europe, and I am part-time full professor at Eindhoven University of Technology starting September 1st, 2014. ATEAMS and SWAT are the same team hosted by both CWI and INRIA as a form of international collaboration.

Research interests

In theory source code is written text which can be changed at any time. In reality the source code of real software systems is mostly too complex to read and understand. The source code of normal software systems is actually quite difficult to manipulate and adapt to changing circumstances and requirements. Perhaps it should not have been called software after all. To make matters more interesting, the older systems are the more complex they become.

My personal goals are to:

help software engineers to analyze source code to efficiently maintain it

2014

Nowadays, software has a ubiquitous presence in everyday life and this phenomenon gives rise to a range of challenges that affect both individuals and society as a whole. In this article we argue that in the future, the domain of software should no longer belong to technical experts and system integrators alone. Instead it should transition to a firmly engaged public domain, similar to city planning, social welfare and security. The challenge that lies at the heart of this problem is the ability to understand, on a technical level, what all the different software actually is and what it does with our information. Read more.

@article{ercim991,
author = {Magiel Bruntink and Jurgen J. Vinju},
title = {Looking Towards a Future where Software is Controlled by the Public (and not the other way around)}
journal = {ERCIM News}
issue = 99,
year = 2014,
}

The introduction of fast and cheap computer and networking hardware enables the spread of software. Software, in a nutshell, represents an unprecedented ability to channel creativity and innovation. The joyful act of simply writing computer programs for existing ICT infrastructure can change the world. We are currently witnessing how our lives can change rapidly as a result, at every level of organization and society and in practically every aspect of the human condition: work, play, love and war.
The act of writing software does not imply an understanding of the resulting creation. We are surprised by failing software (due to bugs), the inability of rigid computer systems to “just do what we want”, the loss of privacy and information security, and last but not least, the million euro software project failures that occur in the public sector. These surprises are generally not due to negligence or unethical behaviour but rather reflect our incomplete understanding of what we are creating. Our creations, at present, are all much too complex and this lack of understanding leads to a lack of control. Read more

The hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than an equivalent array-based data structure. This hinders the scalability of functional programs and the further adoption of this otherwise attractive style of programming.
In this paper we present a product family of hash tries. We gen- erate Java source code to specialize them using knowledge of JVM object memory layout. The number of possible specializations is exponential. The optimization challenge is thus to find a minimal set of variants which lead to a maximal loss in memory footprint on any given data. Using a set of experiments we measured the distribution of internal tree node sizes in hash tries. We used the results as a guidance to decide which variants of the family to generate and which variants should be left to the generic implementation.
A preliminary validating experiment on the implementation of sets and maps shows that this technique leads to a median decrease of 55% in memory footprint for maps (and 78% for sets), while still maintaining comparable performance. Our combination of data analysis and code specialization proved to be effective.

Dynamic languages include a number of features that are
challenging to model properly in static analysis tools. In
PHP, one of these features is the include expression, where
an arbitrary expression provides the path of the file to include
at runtime. In this paper we present two complementary
analyses for statically resolving PHP includes, one that works
at the level of individual PHP files and one targeting PHP
programs, possibly consisting of multiple scripts. To evaluate
the eﬀectiveness of these analyses we have applied the first
to a corpus of 20 open-source systems, totaling more than
4.5 million lines of PHP, and the second to a number of
programs from a subset of these systems. Our results show
that, in many cases, includes can be either resolved to a
speciﬁc ﬁle or a small subset of possible files, enabling better
IDE features and more advanced program analysis tools for
PHP.

Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with respect to SLOC due to strong linear correlation.
We test this claim by studying a corpus of 17.8M methods in 13K open-source Java projects. Our results show that direct linear correlation between SLOC and CC is only moderate, as caused by high variance. We observe that aggregating CC and SLOC over larger units of code improves the correlation, which explains reported results of strong linear correlation in literature. We suggest that the primary cause of correlation is the aggregation.
Our conclusion is that there is no strong linear correlation between CC and SLOC of Java methods, so we do not conclude that CC is redundant with SLOC. This conclusion contradicts earlier claims from literature, but concurs with the widely accepted practice of measuring of CC next to SLOC.

2013

oftware projects consist of different kinds of artifacts: build files, configuration files, markup files, source code in different software languages, and so on.
At the same time, however, most integrated development environments (IDEs) are focused on a single (programming) language.
Even if a programming environment supports multiple languages (e.g., Eclipse), IDE features such as cross-referencing, refactoring, or debugging, do not often cross language boundaries.
What would it mean for programming environment to be truly multilingual?
In this short paper we sketch a vision of a system that
integrates IDE support across language boundaries.
We propose to build this system on a foundation of unified source code models and metaprogramming.
Nevertheless, a number of important and hard research questions still need to be addressed.

In the context of the EU FP7 project ``OSSMETER'' we are developing an infra-structure for measuring source
code. The goal of OSSMETER is to obtain insight in the quality of open-source projects from all possible perspectives, including
product, process and community. This is a "white paper" on M3, a set of code models, which should be easy to construct,
easy to extend to include language specifics and easy to consume to produce metrics and other analyses. We solicit feedback
on its usability.

We are interested in re-engineering families of legacy applications towards
using Domain-Specific Languages (DSLs). Is it worth to invest in harvesting
domain knowledge from the source code of legacy applications?
Reverse engineering domain knowledge from source code is sometimes
considered very hard or even impossible. Is it also difficult for "modern
legacy systems"? In this paper we select two open-source applications and
answer the following research questions: which parts of the domain are
implemented by the application, and how much can we manually recover from
the source code? To explore these questions, we compare manually recovered
domain models to a reference model extracted from domain literature, and
measured precision and recall.
The recovered models are accurate: they cover a significant part of the
reference model and they do not contain much junk. We conclude that domain
knowledge is recoverable from "modern legacy" code and therefore domain
model recovery can be a valuable component of a domain re-engineering
process.

In this paper we present an approach to specifying opera- tor precedence based on declarative disambiguation constructs and an implementation mechanism based on grammar rewriting. We identify a problem with existing generalized context-free parsing and disambigua- tion technology: generating a correct parser for a language such as OCaml using declarative precedence specification is not possible without resorting to some manual grammar transformation. Our approach provides a fully declarative solution to operator precedence specification for context-free grammars, is independent of any parsing technology, and is safe in that it guarantees that the language of the resulting grammar will be the same as the language of the specification grammar. We evaluate our new approach by specifying the precedence rules from the OCaml reference manual against the highly ambiguous reference grammar and validate the output of our generated parser.

Refactoring tools are among the most desirable in the programmer's
toolbox. Any refactoring tool -specific for a particular language and for a
specific kind of refactoring- represents a considerable investment.
At an increasing rate new languages are introduced, and new features are
introduced to existing languages. The development of refactoring tools is
forced to keep with this evolution. The extension of a general purpose language
like Java with generics is a good example that requires both adaptations to
existing refactoring tools, as well as the introduction of new refactoring
tools specific for generics.
We propose a modular language-parametric framework, called "TyMoRe"
(TYpe-related MOdular REfactoring), for constraint-based type refactorings. It
enables reuse between languages and reuse between different refactorings for
the same language. The framework uses functional monadic composition to achieve
the desired modularity and compositionality.
The effectiveness of TyMoRe is demonstrated by our prototype of the ``Infer Generic Type
Arguments'' refactoring for a large subset of Java.

This article is an unpublished draft.

Mark Hills, Paul Klint and Jurgen J. Vinju. An empirical study of PHP feature usage. Proceedings of the International Symposium in Software Testing and Analysis (ISSTA), July 2013. Lugano Switserland.

PHP is one of the most popular languages for server-side application development. The language is highly dynamic, providing programmers with a large amount of flexibility. However, these dynamic features also have a cost, making it difficult to apply traditional static analysis techniques used in standard code analysis and transformation tools. As part of our work on creating analysis tools for PHP, we have conducted a study over a significant corpus of open-source PHP systems, looking at the sizes of actual PHP programs, which features of PHP are actually used, how often dynamic features appear, and how distributed these features are across the files that make up a PHP website. We have also looked at whether uses of these dynamic features are truly dynamic or are, in some cases, statically understandable, allowing us to identify specific patterns of use which can then be taken into account to build more precise tools. We believe this work will be of interest to creators of analysis tools for PHP, and that the methodology we present can be leveraged for other dynamic languages with similar features.

Rascal is a meta programming language focused on the implemen- tation of domain-specific languages and on the rapid construction of tools for software analysis and software transformation. In this paper we focus on the use of Rascal for software analysis. We illustrate a range of scenarios for building new software analysis tools through a number of examples, including one showing integration with an existing Maude-based analysis. We then focus on ongoing work on alias analysis and type inference for PHP, showing how Rascal is being used, and sketching a hypothetical solution in Maude. We conclude with a high-level discussion on the commonalities and differences between Rascal and Maude when applied to program analysis.

Meta-programming applications often require access to het- erogenous sources of information, often from different technological spaces (grammars, models, ontologies, databases), that have specialized ways of defining their respective data schemas. Without direct language support, obtaining typed access to this external, potentially changing, informa- tion is a tedious and error-prone engineering task. The Rascal meta- programming language aims to support the import and manipulation of all of these kinds of data in a type-safe manner. The goal is to lower the engineering effort to build new meta programs that combine information about software in unforeseen ways. In this paper we describe built-in language support, so called resources, for incorporating external sources of data and their corresponding data-types while maintaining type safety. We demonstrate the applicability of Rascal resources by example, showing resources for RSF files, CSV files, JDBC-accessible SQL databases, and SDF2 grammars. For RSF and CSV files this requires a type inference step, allowing the data in the files to be loaded in a type-safe manner without requiring the type to be declared in advance. For SQL and SDF2 a direct translation from their respective schema languages into Rascal is instead constructed, providing a faithful translation of the declared types or sorts into equivalent types in the Rascal type system. An overview of
related work and a discussion conclude the paper.

Assessing the understandability of source code remains an elusive yet
highly desirable goal for software developers and their managers. While
many metrics have been suggested and investigated empirically, the McCabe
cyclomatic complexity metric (CC) --- which is based on control flow
complexity --- seems to hold enduring fascination within both industry and
the research community. However, the CC metric also has obvious
limitations. For example, it is easy to produce example code that seems
trivial to understand yet has a high CC value; at the same time, one can
also produce "spaghetti" code with many GOTOs that has the same CC value
as a well-structured alternative.
In this work, we explore the causal relationship between CC and
understandability through quantitative and qualitative studies, and through
thought experiments and discussion. Empirically, we examine eight
well-known open source Java systems by grouping the abstract control flow
patterns of the methods into equivalence classes and exploring the results.
We found several surprising results: first, the number of unique control
flow patterns is relatively low; second, CC often does not accurately
reflect the intricacies of Java control flow; and third, methods with high
CC often have very low entropy, suggesting that they may be relatively easy
to understand. These findings appear to challenge the widely-held belief
that there is a clear-cut causal relationship between understandability and
cyclomatic complexity, and suggest that at the very least CC and similar
measures need to be reconsidered and refined if they are to be used as a
metric for code understandability.

Real problems in software evolution render impossible a fixed, one-size-fits-all approach, and these problems are usually solved by gluing together various tools and languages. Such ad-hoc integration is cumbersome and costly. With the Rascal meta-programming language the Software Analysis and Transformation research group at CWI explores whether it is feasible to develop an approach that offers all necessary meta-programming and visualization techniques in a completely integrated language environment. We have applied Rascal with success in constructing domain specific languages and experimental refactoring and visualization tools.

2011

We compare the Visitor pattern with the Interpreter pattern,investigating a single case in point for the Java language. We have produced and compared two versions of an interpreter for a programming language. The first version makes use of the Visitor pattern. The second version was obtained by using an automated refactoring to transform uses of the Visitor pattern to uses of the Interpreter pattern. We compare these two nearly equivalent versions on their maintenance characteristics and execution efficiency. Using a tailored experimental research method we can highlight differences and the causes thereof. The contributions of this paper are that it isolates the choice between Visitor and Interpreter in a realistic software project and makes the difference experimentally observable.

Algebraic specification has a long tradition in bridging the gap between specification and programming by making specifications executable. Building on extensive experience in designing, implementing and using specification formalisms that are based on algebraic specification and term rewriting (namely Asf and Asf+Sdf), we are now focusing on using the best concepts from algebraic specification and integrating these into a new programming language: Rascal. This language is easy to learn by non-experts but is also scalable to very large meta-programming applications.
We explain the algebraic roots of Rascal and its main application areas: software analysis, software transformation, and design and implementation of domain-specific languages. Some example applications in the domain of Model-Driven Engineering (MDE) are described to illustrate this.

Static ambiguity detection would be an important aspect of language workbenches for textual software languages. The challenge is that automatic ambiguity detection of context-free grammars is undecidable. Sophisticated approximations and optimizations do exist, but these do not scale to grammars for so-called "scannerless parsers", as of yet. We extend previous work on ambiguity detection for context-free grammars to cover disambiguation techniques that are typical for scannerless parsing, such as longest match and reserved keywords. This paper contributes a new algorithm for ambiguity detection in character-level grammars, a prototype implementation of this algorithm and validation on several real grammars. The total run-time of ambiguity detection for character-level grammars for languages such as C and Java is dramatically reduced by several orders of magnitude, without loss of precision. The result is that ambiguity detection for realistics grammars can be done efficiently and may now become a tool in language workbenches.

In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence. Deducing causes of ambiguity from observing parse forests is hard for grammar engineers because of (a) the size of the parse forests, (b) the complex shape of parse forests, and (c) the diversity of causes of ambiguity.
We first analyze the diversity of ambiguities in grammars for programming languages and the diversity of solutions to these ambiguities. Then we introduce Dr. Ambiguity: a parse forest diagnostics tools that explains the causes of ambiguity by analyzing differences between parse trees and proposes solutions. We demonstrate its effectiveness using a small experiment with a grammar for Java 5.

The Rascal meta-programming language provides a number of features supporting
the development of program analysis tools. However, sometimes the analysis to be
developed is already implemented by another system. In this case, Rascal
can provide a useful front-end for this system, handling the parsing of the input
program, any transformation (if needed) of this program into individual analysis tasks,
and the display of the results generated by the analysis. In this paper we
describe a tool, RLSRunner, which provides this integration with static analysis
tools defined using the K framework, a rewriting-based framework for defining
the semantics of programming languages.

In this paper we present prototype tool-support for the run-time assertion checking of the Java Modeling Language (JML) extended with communication histories specified by attribute grammars. Our tool suite integrates Rascal, a meta programming language and ANTLR, a popular parser generator. Rascal instantiates a generic model of history updates for a given Java program annotated with history specifications. ANTLR is used for the actual evaluation of history assertions.

Automatically generating program translators from source and target language specifications is a non-trivial problem. In this paper we focus on the problem of automating the process of building translators between operations languages, a family of DSLs used to program satellite operations procedures. We exploit their similarities to semi-automatically build transformation tools between these DSLs. The input to our method is a collection of annotated context-free grammars. To simplify the overall translation process even more, we also propose an intermediate representation common to all operations languages. Finally, we discuss how to enrich our annotated grammars model with more advanced semantic annotations to provide a verification system for the translation process. We validate our approach by semi-automatically deriving translators between some real world operations languages, using the prototype tool which we implemented for that purpose.

Does the use of DSL tools improve the maintainability of language implementations compared to implementations from scratch? We present empirical results on aspects of maintainability of six implementations of the same DSL using different languages (Java, JavaScript, C#) and DSL tools (ANTLR, OMeta, Microsoft “M”). Our evaluation indicates that the maintainability of language implementations is indeed higher when constructed using DSL tools.

Model-driven software development (MDSD) has been on the rise over the past few years and is becoming more and more mature. However, evaluation in real-life industrial context is still scarce.
In this paper, we present a case-study evaluating the applicability of a state-of-the-art MDSD tool, MOD4J, a suite of domain specific languages (DSLs) for developing administrative enterprise applications. MOD4J was used to partially rebuild an industrially representative application. This implementation was then compared to a base implementation based on elicited success criteria. Our evaluation leads to a number of recommendations to improve MOD4J.
We conclude that having extension points for hand-written code is a good feature for a model driven software development environment.

Real programming languages are often defined using ambiguous context-free grammars. Some ambiguity is intentional while other ambiguity is accidental. A good grammar development environment should therefore contain a static ambiguity checker to help the grammar engineer.
Ambiguity of context-free grammars is an undecidable property. Nevertheless, various imperfect ambiguity checkers exist. Exhaustive methods are accurate, but suffer from non-termination. Termination is guaranteed by approximative methods, at the expense of accuracy.
In this paper we combine an approximative method with an exhaustive method. We present an extension to the Noncanonical Unambiguity Test that identifies production rules that do not contribute to the ambiguity of a grammar and show how this information can be used to significantly reduce the search space of exhaustive methods. Our experimental evaluation on a number of real world grammars shows orders of magnitude gains in efficiency in some cases and negligible losses of efficiency in others.

Many automated software engineering tools require tight integration of techniques for source code analysis and manipulation. State-of-the-art tools exist for both, but the domains have remained notoriously separate because different computational paradigms fit each domain best. This impedance mismatch hampers the development of each new problem solution since desired functionality and scalability can only be achieved by repeated, ad hoc, integration of different techniques.
RASCAL is a domain-specific language that takes away most of this boilerplate by providing high-level integration of source code analysis and manipulation on the conceptual, syntactic, semantic and technical level. We give an overview of the language and assess its merits by implementing a complex refactoring.

Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further.
In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.

Full-featured integrated development environments have become critical to the adoption of new programming languages. Key to the success of these IDEs is the provision of services tailored to the languages. However, modern IDEs are large and complex, and the cost of constructing one from scratch can be prohibitive. Generators that work from language specifications reduce costs but produce environments that do not fully reflect distinctive language characteristics.
We believe that there is a practical middle ground between these extremes that can be effectively addressed by an open, semi-automated strategy to IDE development. This strategy is to reduce the burden of IDE development as much as possible, especially for internal IDE details, while opening opportunities for significant customizations to IDE services. To reduce the effort needed for customization we provide a combination of frameworks, templates, and generators. We demonstrate an extensible IDE architecture that embodies this strategy, and we show that this architecture can be used to produce customized IDEs, with a moderate amount of effort, for a variety of interesting languages.

2008

An integrated development environment (IDE) monitors all the changes that a user makes to source code modules and responds accordingly by flagging errors, by re-parsing, by rechecking, or by recompiling modules and by adjusting visualizations or other information derived from a module. A module manager is the central component of the IDE that is responsible for this behavior. Although the overall functionality of a module manager in a given IDE is fixed, its actual behavior strongly depends on the programming languages it has to support. What is a module? How do modules depend on each other? What is the effect of a change to a module?
We propose a concise design for a language parametric module manager: a module manager that is parameterized with the module behavior of a specific language. We describe the design of our module manager and discuss some of its properties. We also report on the application of the module manager in the construction of IDEs for the specification language ASF+SDF as well as for Java.
Our overall goal is the rapid development (generation) of IDEs for programming languages and domain specific languages. The module manager presented here represents a next step in the creation of such generic language workbenches.

2005

In this thesis the subject of study is source code. More precisely, I am interested in tools that help in describing, analyzing and transforming source code.
The overall question is how well qualified and versatile the programming language ASF+SDF is when applied to source code analysis and transformation. The main technical issues that are addressed are ambiguity of context-free languages and improving two important quality attributes of analyses and transformations: conciseness and fidelity.
The overall result of this research is a version of the language that is better tuned to the domain of source code analysis and transformation, but is still firmly grounded on the original: a hybrid of context-free grammars and term rewriting. The results that are presented have a broad technical spectrum because they cover the entire scope of ASF+SDF. They include disambiguation by filtering parse forests, the type-safe automation of tree traversal for conciseness, improvements in language design resulting in higher resolution and fidelity, and better interfacing with other programming environments. Each solution has been validated in practice, by me and by others, mostly in the context of industrial sized case studies.
In this introductory chapter we first set the stage by sketching the objectives and requirements of computer aided software engineering. Then the technological background of this thesis is introduced: generic language technology and ASF+SDF. We zoom in on two particular technologies: parsing and term rewriting. We identify research questions as we go and summarize them at the end of this chapter.

In meta programming with concrete object syntax, object-level programs are composed from fragments written in concrete syntax. The use of small program fragments in such quotations and the use of meta-level expressions within these fragments (anti-quotation) often leads to ambiguities. This problem is usually solved through explicit disambiguation, resulting in considerable syntactic overhead. A few systems manage to reduce this overhead by using type information during parsing. Since this is hard to achieve with traditional parsing technology, these systems provide specific combinations of meta and object languages, and their implementations are difficult to reuse. In this paper, we generalize these approaches and present a language independent method for introducing concrete object syntax without explicit disambiguation. The method uses scannerless generalized-LR parsing to parse meta programs with embedded objectlevel fragments, which produces a forest of all possible parses. This forest is reduced to a tree by a disambiguating type checker for the meta language. To validate our method we have developed embeddings of several object languages in Java, including AspectJ and Java itself.

A language specific interactive debugger is one of the tools that we expect in any mature programming environment. We present applications of TIDE: a generic debugging framework that is related to the ASF+SDF Meta-Environment. TIDE can be applied to different levels of debugging that occur in language design.
Firstly, TIDE was used to obtain a full-fledged debugger for language specifications based on term rewriting. Secondly, TIDE can be instantiated for any other programming language, including but not limited to domain specific languages that are defined and implemented using ASF+SDF.
We demonstrate the common debugging interface, and indicate the amount of effort needed to instantiate new debuggers based on TIDE.

Abstract syntax trees are a very common data-structure in language related tools. For example compilers, interpreters, documentation generators, and syntax-directed editors use them extensively to extract, transform, store and produce information that is key to their functionality.
We present a Java back-end for ApiGen, a tool that generates implementations of abstract syntax trees. The generated code is characterized by strong typing combined with a generic interface and maximal sub-term sharing for memory efficiency and fast equality checking. The goal of this tool is to obtain safe and more efficient programming interfaces for abstract syntax trees.
The contribution of this work is the combination of generating a strongly typed data-structure with maximal sub-term sharing in Java. Practical experience shows that this approach is beneficial for extremely large as well as smaller data types.

Meta programming can be facilitated by the ability to represent program fragments in concrete syntax instead of abstract syntax. The resulting meta programs are more self-documenting. One caveat in concrete meta programming is the syntactic separation between the meta language and the object language. To solve this problem, many meta programming systems use quoting and anti-quoting to indicate precisely where level switches occur. These “syntactic hedges” can obfuscate the concrete program fragments. This paper describes an algorithm for inferring quotes, such that the meta programmer no longer needs to explicitly indicate transitions between the meta and object languages.

2003

Term rewriting is an appealing technique for performing program analysis and program transformation. Tree (term) traversal is frequently used but is not supported by standard term rewriting. We extend many-sorted, first-order term rewriting with traversal functions that automate tree traversal in a simple and type safe way. Traversal functions can be bottom-up or top-down traversals and can either traverse all nodes in a tree or can stop the traversal at a certain depth as soon as a matching node is found. They can either define sort preserving transformations or mappings to a fixed sort. We give small and somewhat larger examples of traversal functions and describe their operational semantics and implementation. An assessment of various applications and a discussion conclude the paper.

Generalized parsing technology provides the power and flexibility to attack real-world parsing applications. However, many programming languages have syntactical ambiguities that can only be solved using semantical analysis. In this paper we propose to apply the paradigm of term rewriting to filter ambiguities based on semantical information. We start with the definition of a representation of ambiguous derivations. Then we extend term rewriting with means to handle such derivations. Finally, we apply these tools to some real world examples, namely C and COBOL. The resulting architecture is simple and efficient as compared to semantic directed parsing.

In this paper we present the fusion of generalized LR parsing and scannerless parsing. This combination supports syntax definitions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambiguities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser generators. Hence, a syntax definition becomes a full declarative description of a language. Scannerless generalized LR parsing is a viable technique that has been applied in various industrial and academic projects.

The ASF+SDF Meta-Environment is an interactive development environment for the automatic generation of interactive systems for constructing language definitions and generating tools for them. Over the years, this system has been used in a variety of academic and commercial projects ranging from formal program manipulation to conversion of COBOL systems. Since the existing implementation of the Meta-Environment started exhibiting more and more characteristics of a legacy system, we decided to build a completely new, component-based, version. We demonstrate this new system and stress its open architecture.

2000

Rewriting technology has proved to be an adequate and powerful mechanism to perform source code transformations. These transformations can not only be efficiently implemented using rewriting technology, but it also provides a firmer grip on the source code syntax. However, an important shortcoming of rewriting technology is that source code comments and layout are lost during rewriting. We propose ``rewriting with layout'' to solve this problem. We present a rewriting algorithm that keeps the layout of sub-terms that are not rewritten, and reuses the layout occurring in the right-hand side of the rewrite rules.

1999

Two things researchers in software engineering should do is publish their research prototypes as open-source software and immerse themselves in the activity of software engineering. The reason for the first is that software lends itself perfectly for sharing, especially if its government funded software. There is no excuse not to do this. The reason for the second is that software engineering is so wickedly complex and rapidly evolving, that without doing it yourself it is easy to misunderstand what the problems are or to recognize good solutions.

2011

A case of visitor versus interpreter pattern. June 30th, 2011. Zurich. TOOLS conference. This presentation expains our paper on comparing the impact of choosing between the two functionally inter-changeable design patterns on maintainability of an AST-based language interpreter.