If I take a 32-bit RGBA value from each image, and I want to perform alpha blending and save the result in the destination image, the first thing I would have to do, to use SSE2, would be to convert this vector of 8-bit integers into four packed single-precision floating-point values, in the range 0-1.

I know i can scale the values to 0-1 by just multiplying by (1/255, 1/255, 1/255, 1/255). But how can I convert the 8-bit integers into floating-points efficiently?

You should then have the 32-bit packed value in the low 32 bits of xmm0. How you write that to display memory is up to you: most direct stores from sse registers will store at least 64 bits, so will overwrite the top 32 bits. If you are writing direct to the framebuffer from position 0 upwards then thats fine to do (repeated movqs). Otherwise, if you know the next pixel too, then put it into xmm1 above for the first pack instruction (and then zero for the second) and you'll have two 32-bit pixels in the low half of xmm0 which you can movq to screen even quicker. You can even optimise further and get 4 packed 32 bit pixels into xmm0 if you like (packusdw xmm0, xmm1; packusdw xmm2, xmm3; packusbw xmm0, xmm2). Otherwise if you want to be conservative, reserve some stack space, store 64 bits there, pop into rax and then do a 32-bit GPR move.

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum