Bookmark

OpenURL

Abstract

The CIL compiler for core Standard ML compiles whole programs using a novel typed intermediate language (TIL) with intersection and union types and ow labels on both terms and types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce customized data representations. However, duplication incurs compiletime space costs that are potentially much greater than are incurred in TILs employing type-level abstraction or quanti cation. In this paper, we present empirical data on the compile-time space costs of using CIL as an intermediate language. The data shows that these costs can be made tractable by using suciently ne-grained ow analyses together with standard hash-consing techniques. The data also suggests that nonduplicating formulations of intersection (and union) types would not achieve signi cantly better space complexity.

... the worst case, the tree representation of types in Standard ML (SML) programs can have size doubly exponential in the program size, and the DAG representation can be exponential in the program size =-=[Mit96]-=-. Although we are mainly concerned with ordinary programs where the worst case space complexity is not encountered, these ordinary programs often have types with impractically large tree representatio...

...is certifiably type safe [Nec97, MWCG99]. Furthermore, types that survive through the back end can be used to support run-time operations such as garbage collection [Tol94] and run-time type dispatch =-=[Mor95]-=-. The benefits of using a TIL are not achieved without costs. These costs include the space needed to represent the types at compile-time, the time to manipulate the types at compile-time, and the add...

...use FLINT types are identified modulo β-conversion, and because eager β-normalization of types can lose sharing and do excess work, the hash-consing scheme for FLINT types uses explicit substitutions =-=[KR95]-=- and memoization of substitution propagation steps. Unlike FLINT, the CIL types do not have such higher-order features, so the CIL hash-consing of types is simpler. Sets of flow labels are often used ...

...d together with the projected environment. The other three representation strategies generate specialized representations based on various conditions detected in the term structure. Wand and Steckler =-=[WS94]-=- coined the term “selective” representation to refer to representations of functions that do not include an environment component. A selective representation is adequate for a closed function if the f...

...nted six different flow analyses. In this paper, we present data from two of these: what we call typed source split and min type respecting. Thetyped source split analysis is an variant of Banerjee’s =-=[Ban97]-=- modified for shallow subtyping [WDMT0X]; the use of shallow subtyping makes it slightly less precise than the combination of monomorphization and 0CFA analysis. It introduces virtual tuples and virtu...

... analyses in CIL remains an important area for future work. Recent work has shown that many standard flow analyses, such as k-CFA [Shi91, JW95, NN97] and the cartesian product argument-based analysis =-=[Age95]-=- can be encoded into a type system with intersection and union types and flow labels [PP0X, AT00]. However, unlike CIL, these type systems have deep sub19typing. We are exploring a translation betwee...

...ss accessible and can be tricky to adapt to more complex situations [WDMT0X]. We have made preliminary investigations into other representations, e.g., one based on the skeletons and substitutions of =-=[KW99]-=-. 4Based on the empirical results presented here, we believe that developing a non-duplicating representation of CIL may be not critical (though it may still be worthwhile). However, it remains to be...

...izing, Parsing, Elaboration. In implementing the compiler, we took advantage of existing tools and other freely available SML compilers. The CIL compiler uses the MLton source-to-source defunctorizer =-=[CJW00]-=- as a prepass to convert SML into Core SML. It then uses the front end of the SML/NJ 110.03 compiler (somewhat modified) to produce FLINT code. The FLINT code is translated to untyped CIL code, keepin...

... type [MMH96, MWCG99, CWM98]. In the CIL compiler, these differences are reconciled by injecting the types of closures into a union type and performing a virtual case dispatch at the application site =-=[DMTW97]-=-. In a type-erasure semantics, these injections do not give rise to any run-time code. However, they can potentially cause a blowup in compile-time space when many functions with different free variab...

... sinks, or elimination points) for the values of a flow-annotated type. Intersection and union types have several advantages over universal and existential types as a means of expressing polymorphism =-=[WDMT0X]-=-: (1) by making usage contexts apparent, they support flow-based customizations in a type-safe way; (2) finitary polymorphism can type more terms than infinitary polymorphism; and (3) the listing-base...

...embly code is linked with a runtime library providing the environment in which CIL programs are executed. The back end is based on MLRISC, a framework for building portable optimizing code generators =-=[Geo97]-=-. CIL programs are translated into the MLRISC intermediate language, and the framework is specialized with CIL conventions for each target architecture. 3 MLRISC handles language-independent issues su...

...g heuristics for choosing between allowable representations. In terms of function representations, we are currently investigating lightweight closure conversion [SW97, Sis99], higher-order uncurrying =-=[HH98]-=-, and register allocation and calling conventions informed by flow information. We have yet to explore customized representations for other kinds of data, but CIL is rich enough to support flow-direct...

...sibility of differing representations of the same recursive type. Our new method of incremental DFA minimization to represent all types in the same graph is similar to a method suggested by Mauborgne =-=[Mau00]-=-, but was developed completely independently. Our method needs O(n log n) space to store the types, while Mauborgne’s needs O(n2 log n) space, where n is the number of distinct types and some upper-bo...

...yet do), the resulting object code is certifiably type safe [Nec97, MWCG99]. Furthermore, types that survive through the back end can be used to support run-time operations such as garbage collection =-=[Tol94]-=- and run-time type dispatch [Mor95]. The benefits of using a TIL are not achieved without costs. These costs include the space needed to represent the types at compile-time, the time to manipulate the...