Abstract : In the last decades embedded systems have been challenged with more and more application variety, each time more constrained. This implies an ever growing need for performances and energy efficiency in arithmetic units. This work studies solutions ranging from hardware to software to improve arithmetic support in embedded systems. Some of these solutions were integrated in Kalray's MPPA processor.The first part of this work focuses on floating-Point arithmetic support in the MPPA. It starts with the design of a floating-Point unit (FPU) based on the classical FMA (Fused Multiply-Add) operator. The improvements we suggest, implement and evaluate include a mixed precision FMA, a 3-Operand add and a 2D scalar product, each time with a single rounding and support for subnormal numbers. It then considers the implementation of division and square root. The FPU is reused and modified to optimize the software implementations of those primitives at a lower cost. Finally, this first part opens up on the development of a code generator designed for the implementation of highly optimized mathematical libraries in different contexts (architecture, accuracy, latency, throughput).The second part studies a reconfigurable coprocessor, a hardware operator that could be dynamically modified to adapt on the fly to various applicative needs. It intends to provide performance close to ASIC implementation, with some of the flexibility of software. One of the addressed challenges is the integration of such a reconfigurable coprocessor into the low power embedded cluster of the MPPA. Another is the development of a software framework targeting the coprocessor and allowing design space exploration.The last part of this work leaves micro-Architecture considerations to study the efficient use of parallel arithmetic resources. It presents an improvement of regular architectures (Single Instruction Multiple Data), like those found in graphic processing units (GPU), for the execution of divergent control flow graphs.