In this paper, we propose a novel programmable functional unit (PFU) to accelerate general purpose application execution on a modern out-of-order x86 processor in a complexity-effective way. Code is transformed and instructions are generated that run on the PFU using a co-designed virtual machine (Cd-VM). Groups of frequently executed micro-operations (micro-ops) are identified and fused into a ma...
View full abstract»

A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represent...
View full abstract»

A fundamental problem in compiler optimization, which has increased in importance due to the spread of multi-core architectures, is to find parallelism in sequential programs. Current processors can only be fully taken advantage of if workload is distributed over the available processors. In this paper we look at distributing instructions in a block of code over multi-cluster processors, the instr...
View full abstract»

We propose a new approach that automatically parallelizes Java programs at runtime. The approach collects on-line trace information during program execution, and dynamically recompiles methods that can be executed in parallel. Wealso describe a cost/benefit model that makes intelligent parallelization decisions, as well as a parallel execution environment to execute parallelized code. We implement...
View full abstract»

While the popularity of using high-level programming languages such as MATLAB for scientific and engineering applications continues to grow, its poor performance compared to traditional languages such as Fortran or C continues to impede its deployment in full-scale simulations and data analysis. Additionally, its poor memory performance limits its performance. To ameliorate performance, we have be...
View full abstract»

Dynamic or Just-in-Time (JIT) compilation is crucial to achieve acceptable performance for applications written in traditionally interpreted languages, such as Java and C#. Such languages enable the generation of portable applications that are written and compiled once, and can be executed by a virtual machine on any supported architecture. However, by virtue of occurring at runtime, dynamic compi...
View full abstract»

Accesses to shared data structures in multithreaded programs must be correctly synchronized to ensure data consistency and integrity. However, this synchronization between threads is a common source of performance problems in multithreaded applications. Lock-free data structures are an alternative to traditional synchronization methods that have potential for not only better performance and scalab...
View full abstract»

Knowledge about program worst case execution time (WCET) is essential in validating real-time systems and helps in effective scheduling. One popular approach used in industry is to measure execution time of program components on the target architecture and combine them using static analysis of the program. Measurements need to be taken in the least intrusive way in order to avoid affecting accurac...
View full abstract»

Partial inlining is an efficient way of inlining, which inlines only part of the callee function, thus reducing the code expansion. The key problem is how to split the callee function effectively so that both the call overhead and the code expansion can be reduced. Previous techniques either lead to function splits too large to be inlined, or fail to reduce the call overhead effectively. In this p...
View full abstract»