Hello, I suggest you to use SSE2 with 128-bit XMM registers instead of ancient 64-bit MMX (mm0-mm7),try using intrinsics instead of assembly (e.g. you will need _mm_add_epi32() in your case) to minimize efforts or use compiler, like Intel C/C++ Compiler,which can easily auto-vectorize such constructs.I'm not sure I understand what the snippet shown is trying to achieve ... but please note that type 'int' is 32-bit, i.e. requires 4-byteof mem,and PADDD instruction to add.