Tools

"... Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960, and memory allocation is widely considered to be either a solved problem or an insoluble one. In this survey, we describe a variety of memory allocator designs and point out issues relevant to their de ..."

Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960, and memory allocation is widely considered to be either a solved problem or an insoluble one. In this survey, we describe a variety of memory allocator designs and point out issues relevant to their design and evaluation. We then chronologically survey most of the literature on allocators between 1961 and 1995. (Scores of papers are discussed, in varying detail, and over 150 references are given.) We argue that allocator designs have been unduly restricted by an emphasis on mechanism, rather than policy, while the latter is more important; higher-level strategic issues are still more important, but have not been given much attention. Most theoretical analyses and empirical allocator evaluations to date have relied on very strong assumptions of randomness and independence, but real program behavior exhibits important regularities that must be exploited if allocators are to perform well in practice.

"... Digital Equipment Corporation The 1980s have witnessed the emergence of a new architecture for computing based on networks of personal computer workstations. The performance requirements of such systems of workstations places a strain on traditional approaches to network architecture. The integratio ..."

Digital Equipment Corporation The 1980s have witnessed the emergence of a new architecture for computing based on networks of personal computer workstations. The performance requirements of such systems of workstations places a strain on traditional approaches to network architecture. The integration of diverse systems into this environment introduces functional compatibility issues that are not present in homogeneous networks. Effective prescriptions for functional compatibility, therefore, must go beyond the com-munication paradigms used in present distributed systems, such as remote procedure calls. This paper proposes a distributed system architecture in which communication follows a program-ming paradigm. In this architecture a programming language provides remote service interfaces for the heterogeneous distributed system environment. This language is a flexible and efficient medium for implementing service function protocols. In essence, clients and servers communicate by program-ming one another.

by
Chi-Keung Luk , Todd C. Mowry
- IN PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 1999

"... By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache conflicts and false sharing. Unfortunately, it is extremely difficult to guarantee that such optimizations are safe in pract ..."

By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache conflicts and false sharing. Unfortunately, it is extremely difficult to guarantee that such optimizations are safe in practice on today&apos;s machines, since accurately updating all pointers to an object requires perfect alias information, which is well beyond the scope of the compiler for languages such as C. To overcome this limitation, we proposea technique called memory forwarding which effectively adds a new layer of indirection within the memory system whenever necessary to guarantee that data relocation is always safe. Because actual forwarding rarely occurs (it exists as a safety net), the mechanism can be implemented as an exception in modern superscalar processors. Our experimental results demonstrate that the aggressive layout optimizations enabled by memory forwarding can result in significant speedups--...

by
Zhong Shao
- In ACA4 Conference on Lisp and Functional Programming, 1994

"... Lists are ubiquitous in functional programs, thus supporting lists efficiently is a major concern to compiler writers for functional languages. Lists are normally represented as linked cons cells, with each cons cell containing a car (the data) and a cdr (the link); this is inefficient in the use of ..."

Lists are ubiquitous in functional programs, thus supporting lists efficiently is a major concern to compiler writers for functional languages. Lists are normally represented as linked cons cells, with each cons cell containing a car (the data) and a cdr (the link); this is inefficient in the use of space, because 5070 of the storage is used for links. Loops and recursions on lists are slow on modern machines because of the long chains of control dependence (in checking for d) and data dependence (in fetching cdr fields). We present a data structure for “unrolled lists, ” where each cell has several data items (car fields) and one link (cdr). This reduces the memory used for links, and it significantly shortens the length of control-dependence and datadependence chains in operations on lists. We further present an efficient compile-time analysis that transforms programs written for “ordinary ” lists into programs on unrolled lists. The use of our new representation requires no change to existing programs. We sketch the proof of soundness of our analysis-which is based on refinement types—and present some preliminary measurements of our technique. 1

"... Many language theoreticians have taken great efforts in designing higher-level programming languages that are more elegant and more expressive than conventional languages. However, few of these new languages have been implemented very efficiently. The result is that most software engineers still pre ..."

Many language theoreticians have taken great efforts in designing higher-level programming languages that are more elegant and more expressive than conventional languages. However, few of these new languages have been implemented very efficiently. The result is that most software engineers still prefer to use conventional languages, even though the new higherlevel languages offer a better and simpler programming model. This dissertation concentrates on improving the performance of programs written in Standard ML (SML)---a statically typed functional language---on today&apos;s RISC machines. SML poses tough challenges to efficient implementations: very frequent function calls, polymorphic types, recursive data structures, higher-order functions, and first-class continuations. This dissertation presents the design and evaluation of several new compilation techniques that meet these challenges by taking advantage of some of the higher-level language features in SML. Type-directed compilation ...

by
Phil Bagwell
- In Implementation of Functional Languages, 14th International Workshop, 2002

"... This paper introduces a new data structure, the VList, that is compact, thread safe and significantly faster to use than Linked Lists for nearly all list operations. Space usage can be reduced by 50% to 90% and in typical list operations speed improved by factors ranging from 4 to 20 or more. Some ..."

This paper introduces a new data structure, the VList, that is compact, thread safe and significantly faster to use than Linked Lists for nearly all list operations. Space usage can be reduced by 50% to 90% and in typical list operations speed improved by factors ranging from 4 to 20 or more. Some important operations such as indexing and length are typically changed from O(N) to O(1) and O(lgN) respectively. A language interpreter Visp, using a dialect of Common Lisp, has been implemented using VLists and the benchmark comparison with OCAML reported. It is also shown how to adapt the structure to create variable length arrays, persistent deques and functional hash tables. The VArray requires no resize copying and has an average O(1) random access time. Comparisons are made with previous resizable one dimensional arrays, Hash Array Trees (HAT) Sitarski [1996], and Brodnik, Carlsson, Demaine, Munro, and Sedgewick [1999]

"... Garbage Collection frees the programmer from the burden of explicitly deallocating unused data. This facility induces a considerable overhead but also causes some delays that may affect real-time applications. Guaranteed throughput (with at most short and predictable delays) is needed in many applic ..."

Garbage Collection frees the programmer from the burden of explicitly deallocating unused data. This facility induces a considerable overhead but also causes some delays that may affect real-time applications. Guaranteed throughput (with at most short and predictable delays) is needed in many applications such as plane or plant control and requires at least a worst case analysis to identify the performances of the whole system. Traditional GC are made of two phases : the marker which identifies all useful data, followed by the sweeper which reclaims all useless data. On-the-fly GC schemes were introduced to permit an application and a collector to run concurrently. That concurrency may lessen the GC penalty incurred by the application. We present here a new algorithm where the application, the marker and the sweeper are concurrent. The benefit is to tightly adjust collection rate to application consumption and have an allocation time bounded by a small constant. Moreover our algorithm ...

"... Lists are ubiquitous in functional programs, thus supporting lists e ciently is a major concern to compiler writers for functional languages. Lists are normally represented as linked cons cells, with each cons cell containing a car (the data) and a cdr (the link); this is ine-cient in the use of spa ..."

Lists are ubiquitous in functional programs, thus supporting lists e ciently is a major concern to compiler writers for functional languages. Lists are normally represented as linked cons cells, with each cons cell containing a car (the data) and a cdr (the link); this is ine-cient in the use of space, because 50 % of the storage is used for links. Loops and recursions on lists are slow on modern machines because of the long chains of control dependences (in checking for nil) and data dependences (in fetching cdr elds). We present a data structure for \unrolled lists, &quot; where each cell has several data items (car elds) and one link (cdr). This reduces the memory used for links, and it signi cantly shortens the length of control-dependence and data-dependence chains in operations on lists. We further present an e cient compile-time analysis that transforms programs written for \ordinary &quot; lists into programs on unrolled lists. The use of our new representation requires no change to existing programs. We sketch the proof of soundness of our analysis| which is based on re nement types|and present some preliminary measurements of our technique. 1

"... Programs written in a higher level language are often less.efficient than equivalent assembly language programs, because they cannot exploit known invariances and optimizations which would violate the strict semantics of the target language. Moving code from Lisp into the kernel has been a tradition ..."

Programs written in a higher level language are often less.efficient than equivalent assembly language programs, because they cannot exploit known invariances and optimizations which would violate the strict semantics of the target language. Moving code from Lisp into the kernel has been a traditional way of improving the performance of Lisp systems. Substantial sections of the PDP-10 implementation of Interlisp, for example, are in machine code for this reason. When a large proportion of AltoLisp was moved from Bcpl into Lisp in order to improve memory utilization and aid modification, the speed of the system decreased by nearly a factor of three [Deutsch, 1978]. Thus, to improve DoradoLisp performance, we first looked for Lisp-coded sections of the system that could be incorporated into the Bcpl&apos;kerne[,.However,&apos;~,e soon discovered that the poor performance was due more to the design of the algorithms in the kernel than to the language in which they