optimization, we focus on reducing the number of memory accesses for embedded systems. The memory access significantly limits the performance of embedded systems due to the widening processor-memory gap. Besides performance, memory accesses consume a large fraction of overall power consumption. With the emergence of memory-intensive embedded applications, effective optimization techniques are required to reduce the number of memory accesses. In this thesis, we make the following original contributions in this field. - First, we develop a general compiler optimization technique called “REAM”; to reduce the number of memory accesses for Digital Signal Processing (DSP) applications with loops. In the loop kernels of DSP applications, one important characteristic is that the same memory location is repeatedly accessed by different memory operations over multiple loop iterations. For DSP applications, therefore, an important problem is how to explore redundant memory accesses and eliminate them by exploiting the desired value across iterations. We solve this problem by replacing redundant memory operations with register operations. The results show that our technique can effectively reduce the number of memory accesses and improve performance compared with previous approaches. - Second, as embedded systems have a limited number of registers, we propose a register allocation and instruction scheduling technique to improve the “REALM”; technique with register constraints. For the register operations generated by the “REALM”; technique, we analyze their data dependencies for instruction scheduling, and build up a register-matching graph model to find available physical registers that can be allocated to the operands of the register operations. The register allocation problem is solved by finding a simple path of fixed length between two specified vertices in the register-matching graph. We perform instruction scheduling based on the results of the allocation. In low power optimization, we address two challenging issues, leakage and temperature, for embedded systems. Leakage power has become an issue comparable in importance to dynamic power as semi-conductor technologies move down to the nanometer scale. Besides leakage power, temperature issues are also important because both on-chip power density and temperature are rising exponentially with decreasing feature sizes. The increase in on-chip temperature can lead to severe problems with reliability, performance, and cooling costs for embedded systems. To address these issues, we make the following contributions. - The first contribution is to reduce the leakage power consumption of VLIW (Very Long Instruction Word) processors. We propose a novel leakage-aware modulo scheduling technique that helps hardware-based leakage control schemes to achieve leakage power savings for embedded VLIW processors. We also consider transition time and power overhead in our technique, and discuss the trade-off between leakage savings and performance penalties. - The second contribution is to reduce the peak temperature of the on-chip memory subsystem. Most embedded systems adopt a hybrid memory architecture, which contains both hardware-managed cache and software-managed scratchpad memory (SPM). However, both cache and SPM have become hot spots, as they are the most frequently accessed on-chip components. We propose a temperature-aware data allocation technique to explore such a hybrid architecture to jointly optimize performance and peak temperature. Our technique can greatly alleviate the temperature hot spots of the memory subsystem by adaptively distributing the workload between cache and SPM.