Abstract: Increasing complexity of embedded systems brings a big challenge for designers to satisfy requirements for both high-performance and programmability. Automatic multi-threaded code generation facilitates MPSoC-based programming greatly. Apart from the savings on programming effort, system performance is also an important issue to be considered during code generation process. As thread communication is quite frequent in multi-threaded code, system performance will be improved by reducing communication cost. Communication pipeline technique applies distributed memory server for parallel execution between message passing and functional tasks, to reduce the cost caused by communication between different processors. The technique can be applied directly to communicating threads in acyclic topologies. To maximise its application, we also provide a solution to apply the technique to cyclic topologies with allocable delay units. Furthermore, we introduce a scheduling strategy for local threads to improve communication efficiency and processor usage. Experimental results demonstrate the performance improvements with the proposed techniques.