Abstract. This dissertation shows that complex, real programming languages can be completely formalized in the K Framework, yielding interpreters and analysis tools for testing and bug detection. This is demonstrated by providing, in K, the first complete formal semantics of the C programming language. With varying degrees of effort, tools such as interpreters, debuggers, and model-checkers, together with tools that check for memory safety, races, deadlocks, and undefined behavior are then generated from the semantics.

Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes 99.2% of 776 test programs. The semantics is also evaluated against popular analysis tools, using a new test suite in addition to a third-party test suite. The semantics-based tool performs at least as well or better than the other tools tested.

Contents

Research

I am interested in programming language semantics, type systems, reflection (semantics and safety), coinduction, and automated reasoning. I'm currently focusing on giving a complete formal semantics to the C programming language. This current state of the project can be found on our public Google Code project. Additionally, we have a POPL'12 paper:

Abstract. This paper describes an executable formal semantics of C. Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes 99.2% of 776 test programs. It is the most complete and thoroughly tested formal definition of C to date. The semantics yields an interpreter, debugger, state space search tool, and model checker "for free". The semantics is shown capable of automatically finding program errors, both statically and at runtime. It is also used to enumerate nondeterministic behavior.

Abstract. This paper introduces matching logic, a novel framework for defining axiomatic semantics for programming languages, inspired from operational semantics. Matching logic specifications are particular first-order formulae with constrained algebraic structure, called patterns. Program configurations satisfy patterns iff they match their algebraic structure and satisfy their constraints. Using a simple imperative language (IMP), it is shown that a restricted use of the matching logic proof system is equivalent to IMP's Hoare logic proof system, in that any proof derived using either can be turned into a proof using the other. Extensions to IMP including a heap with dynamic memory allocation and pointer arithmetic are given, requiring no extension of the underlying first-order logic; moreover, heap patterns such as lists, trees, queues, graphs, etc., are given algebraically using fist-order constraints over patterns.

Abstract. We show how programming language semantics and definitions of their corresponding type systems can both be written in a single framework amenable to proofs of soundness. The framework is based on full rewriting logic (not to be confused with context reduction or term rewriting), where rules can match anywhere in a term (or configuration).

We present an extension of the syntactic approach to proving type system soundness presented by Wright and Felleisen [1994] that works in the above described semantics-based domain. As before, the properties of preservation and progress are crucial. We use an abstraction function to relate semantic configurations in the language domain to semantic configurations in the type domain, and then proceed to use the preservation and progress properties as usual. We also develop an abstract type system, which is a type system modulo certain structural characteristics.

To demonstrate the method, we give examples of five languages and corresponding type systems. They include two imperative languages and three functional languages, and three type checkers and two type inferencers. We then proceed to prove that preservation holds for each.

Abstract. Meseguer and Roșu proposed rewriting logic semantics (RLS) as a programing language definitional framework that unifies operational and algebraic denotational semantics. RLS has already been used to define a series of didactic and real languages, but its benefits in connection with defining and reasoning about type systems have not been fully investigated. This paper shows how the same RLS style employed for giving formal definitions of languages can be used to define type systems. The same term-rewriting mechanism used to execute RLS language definitions can now be used to execute type systems, giving type checkers or type inferencers. The proposed approach is exemplified by defining the Hindley-Milner polymorphic type inferencer as a rewrite logic theory and using this definition to obtain a type inferencer by executing it in a rewriting logic engine. The inferencer obtained this way compares favorably with other definitions or implementations of . The performance of the executable definition is within an order of magnitude of that of highly optimized implementations of type inferencers, such as that of OCaml.