"... 64-bit processor architectures like the Intel ® Itanium® Processor Family are designed for large applications that need large memory addresses. When running applications that fit within a 32-bit address space, 64-bit CPUs are at a disadvantage compared to 32-bit CPUs because of the larger memory foo ..."

64-bit processor architectures like the Intel ® Itanium® Processor Family are designed for large applications that need large memory addresses. When running applications that fit within a 32-bit address space, 64-bit CPUs are at a disadvantage compared to 32-bit CPUs because of the larger memory footprints for their data. This results in worse cache and TLB utilization, and consequently lower performance because of increased miss ratios. This paper considers software techniques for virtual machines that allow 32-bit pointers to be used on 64bit CPUs for managed runtime applications that do not need the full 64-bit address space. We describe our pointer compression techniques and discuss our experience implementing these for Java 1 applications. In addition, we give performance results with our techniques for both the SPEC JVM98 and SPEC JBB2000 benchmarks. We demonstrate a 12 % performance improvement on SPEC JBB2000 and a reduction in the number of garbage collections required for a given heap size. 1.

"... The capabilities of applications executing on embedded and mobile devices are strongly influenced by memory size limitations. In fact, memory limitations are one of the main reasons that applications run slowly or even crash in embedded/mobile devices. While improvements in technology enable the int ..."

The capabilities of applications executing on embedded and mobile devices are strongly influenced by memory size limitations. In fact, memory limitations are one of the main reasons that applications run slowly or even crash in embedded/mobile devices. While improvements in technology enable the integration of more memory into embedded devices, the amount memory that can be included is also limited by cost, power consumption, and form factor considerations. Consequently, addressing memory limitations will continue to be of importance. Focusing on embedded Java environments, this paper shows how object compression can improve memory space utilization. The main idea is to make use of the observation that a small set of values tend to appear in some fields of the heapallocated objects much more frequently than other values. Our analysis shows the existence of such frequent field values in the SpecJVM98 benchmark suite. We then propose two object compression schemes that eliminate/reduce the space occupied by the frequent field values. Our extensive experimental evaluation using a set of eight Java benchmarks shows that these schemes can reduce the minimum heap size allowing Java applications to execute without outof-memory exceptions by up to 24 % (14 % on an average).

"... Developing embedded systems software poses unique challenges to Java application developers and virtual machine designers. Chief among these challenges is the memory footprint of both the virtual machine and the applications that run within it. With the rapidly increasing set of features provided by ..."

Developing embedded systems software poses unique challenges to Java application developers and virtual machine designers. Chief among these challenges is the memory footprint of both the virtual machine and the applications that run within it. With the rapidly increasing set of features provided by the Java language, virtual machine designers are often forced to build custom implementations that make various tradeoffs between the footprint of the virtual machine and the subset of the Java language and class libraries that are supported. In this paper, we present the ExoVM, a system in which an application is initialized in a fully featured virtual machine, and then the code, data, and virtual machine features necessary to execute it are packaged into a binary image. Key to this process is feature analysis, a technique for computing the reachable code and data of a Java program and its implementation inside the VM simultaneously. The ExoVM reduces the need to develop customized embedded virtual machines by reusing a single VM infrastructure and automatically eliding the implementation of unused Java features on a per-program basis. We present a constraint-based instantiation of the analysis technique, an implementation in IBM’s J9 Java VM, experiments evaluating our technique for the EEMBC benchmark suite, and some discussion of the individual costs of some of Java’s features. Our evaluation shows that our system can reduce the non-heap memory allocation of the virtual machine by as much as 75%. We discuss VM and language design decisions that our work shows are important in targeting embedded systems, supporting the long-term goal of a common VM infrastructure spanning from motes to large servers. 1.

by
Ben L. Titzer, Jens Palsberg
- In Proceedings of CASES’07, International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2007

"... Research into embedded sensor networks has placed increased focus on the problem of developing reliable and flexible software for microcontroller-class devices. Languages such as nesC [8] and Virgil [14] have brought higher-level programming idioms to this lowest layer of software, thereby adding ex ..."

Research into embedded sensor networks has placed increased focus on the problem of developing reliable and flexible software for microcontroller-class devices. Languages such as nesC [8] and Virgil [14] have brought higher-level programming idioms to this lowest layer of software, thereby adding expressiveness. Both languages are marked by the absence of dynamic memory allocation, which removes the need for a runtime system to manage memory. To provide data structures, nesC offers modules, and Virgil offers the application an opportunity to allocate and initialize objects during compilation. This paper explores techniques for compressing fixed object heaps with the goal of reducing the RAM footprint of a program. We explore table-based compression and introduce a novel form of object layout called vertical object layout. We provide experimental results that measure the impact on RAM size, code size, and execution time for a set of Virgil programs. Our results show that compressed vertical layout has better execution time and code size than tablebased compression while achieving more than 20 % heap reduction on 6 of 12 benchmark programs.

"... We introduce a class of transformations that modify the representation of dynamic data structures used in programs with the objective of compressing their sizes. Based upon a profiling study of data value characteristics, we have developed the common-prefix and narrow-data transformations that respe ..."

We introduce a class of transformations that modify the representation of dynamic data structures used in programs with the objective of compressing their sizes. Based upon a profiling study of data value characteristics, we have developed the common-prefix and narrow-data transformations that respectively compress a 32 bit address pointer and a 32 bit integer field into 15 bit entities. A pair of fields that have been compressed by the above compression transformations are packed together into a single 32 bit word. The above transformations are designed to apply to data structures that are partially compressible, that is, they compress portions of data structures to which transformations apply and provide a mechanism to handle the data that is not compressible. The accesses to compressed data are efficiently implemented by designing data compression extensions (DCX) to the processor’s instruction set. We have observed average reductions in heap allocated storage of 25 % and average reductions in execution time and power consumption of 30%. If DCX support is not provided the reductions in execution times fall from 30 % to

"... Software for resource constrained embedded devices is often implemented in the Java programming language because the Java compiler and virtual machine provide enhanced safety, portability, and the potential for run-time optimization. It is important to verify that a software application executes cor ..."

Software for resource constrained embedded devices is often implemented in the Java programming language because the Java compiler and virtual machine provide enhanced safety, portability, and the potential for run-time optimization. It is important to verify that a software application executes correctly in the environment in which it will normally execute, even if this environment is an embedded one that severely constrains memory resources. Testing can be used to isolate defects within and establish a confidence in the correctness of a Java application that executes in a resource constrained environment. However, executing test suites with a Java virtual machine (JVM) that uses dynamic compilation to create native code bodies can introduce significant testing time overheads if memory resources are highly constrained. This paper describes an approach that uses adaptive code unloading to ensure that it is feasible to perform testing in the actual memory constrained execution environment. The experiments demonstrate that code unloading can reduce both the test suite execution time by 34 % and the code size of the test suite and application under test by 78 % while maintaining the overall size of the JVM. Categories and Subject Descriptors: D.2.5 [Software Engineering]: Testing and Debugging-Testing tools; D.3.4 [Programming Languages]: Processors-code generation,

"... On one hand, the high cost of memory continues to drive demand for memory efficiency on embedded and general purpose computers. On the other hand, programmers are increasingly turning to managed languages like Java for their functionality, programmability, and reliability. Managed languages, however ..."

On one hand, the high cost of memory continues to drive demand for memory efficiency on embedded and general purpose computers. On the other hand, programmers are increasingly turning to managed languages like Java for their functionality, programmability, and reliability. Managed languages, however, are not known for their memory efficiency, creating a tension between productivity and performance. This paper examines the sources and types of memory inefficiencies in a set of Java benchmarks. Although prior work has proposed specific heap data compression techniques, they are typically restricted to one model of inefficiency. This paper generalizes and quantitatively compares previously proposed memorysaving approaches and idealized heap compaction. It evaluates a variety of models based on strict and deep object equality, field value equality, removing bytes that are zero, and compressing fields and arrays with a limited number and range of values. The results show that substantial memory reductions are possible in the Java heap. For example, removing bytes that are zero from arrays is particularly effective, reducing the application’s memory footprint by 41 % on average. We are the first to combine multiple savings models on the heap, which very effectively reduces the application by up to 86%, on average 58%. These results demonstrate that future work should be able to combine a high productivity programming language with memory efficiency.

"... Arrays are the ubiquitous organization for indexed data. Throughout programming language evolution, implementations have laid out arrays contiguously in memory. This layout is problematic in space and time. It causes heap fragmentation, garbage collection pauses in proportion to array size, and wast ..."

Arrays are the ubiquitous organization for indexed data. Throughout programming language evolution, implementations have laid out arrays contiguously in memory. This layout is problematic in space and time. It causes heap fragmentation, garbage collection pauses in proportion to array size, and wasted memory for sparse and over-provisioned arrays. Because of array virtualization in managed languages, an array layout that consists of indirection pointers to fixed-size discontiguous memory blocks can mitigate these problems transparently. This design however incurs significant overhead, but is justified when real-time deadlines and space constraints trump performance. This paper proposes z-rays, a discontiguous array design with flexibility and efficiency. A z-ray has a spine with indirection pointers to fixed-size memory blocks called arraylets, and uses five optimizations: (1) inlining the first N array bytes into the spine, (2) lazy allocation, (3) zero compression, (4) fast array copy, and (5) arraylet copy-on-write. Whereas discontiguous arrays in prior work improve responsiveness and space efficiency, z-rays combine time efficiency and flexibility. On average, the best z-ray configuration performs within 12.7 % of an unmodified Java Virtual Machine on 19 benchmarks, whereas previous designs have two to three times higher overheads. Furthermore, language implementers can configure z-ray optimizations for various design goals. This combination of performance and flexibility creates a better building block for past and future array optimization.

"... Memory is a scarce resource during embedded system design. Increasing memory often increases packaging costs, cooling costs, size, and power consumption. This paper presents CRAMES, a novel and efficient software-based RAM compression technique for embedded systems. The goal of CRAMES is to dramatic ..."

Memory is a scarce resource during embedded system design. Increasing memory often increases packaging costs, cooling costs, size, and power consumption. This paper presents CRAMES, a novel and efficient software-based RAM compression technique for embedded systems. The goal of CRAMES is to dramatically increase effective memory capacity without hardware or application design changes, while maintaining high performance and low energy consumption. To achieve this goal, CRAMES takes advantage of an operating system’s virtual memory infrastructure by storing swapped-out pages in compressed format. It dynamically adjusts the size of the compressed RAM area, protecting applications capable of running without it from performance or energy consumption penalties. In addition to compressing working data sets, CRAMES also enables efficient in-RAM filesystem compression, thereby further increasing RAM capacity. CRAMES was implemented as a loadable module for the Linux kernel and evaluated on a battery-powered embedded system. Experimental results indicate that CRAMES is capable of doubling the amount of RAM available to applications running on the original system hardware. Execution time and energy consumption for a broad range of examples are rarely affected. When physical RAM is reduced to 62.5 % of its original quantity, CRAMES enables the target embedded

"... Memory constraint presents one of the critical challenges for embedded software writers. While circuit-level solutions based on cramming as many bits as possible into the smallest area possible are certainly important, memory-conscious software can bring much higher benefits. Focusing on an embedded ..."

Memory constraint presents one of the critical challenges for embedded software writers. While circuit-level solutions based on cramming as many bits as possible into the smallest area possible are certainly important, memory-conscious software can bring much higher benefits. Focusing on an embedded Java-based environment, this paper studies potential benefits and challenges when heap memory is managed at a field granularity instead of object. This paper discusses these benefits and challenges with the help of two field-level analysis techniques. The first of these, called the field-level lifetime analysis, takes advantage of the observation that, for a given object instance, not all the fields have the same lifetime. The field-level lifetime analysis demonstrates the potential benefits of exploiting this information. Our second analysis, referred to as the disjointness analysis, is built upon the fact that, for a given object, some fields have disjoint lifetimes, and therefore, they can potentially share the same memory space. To quantify the impact of these techniques, we performed experiments with several benchmarks, and point out the important characteristics that need to be considered by application writers.