Counting Number of Bits Set

I'm working on a program where I need to count the number of bits that are set in a Word (AX) or DWord (EAX). Below is the code I'm using now. I'm just wondering if there is some special CPU instruction(s) to do something like this, instead of needing to do a loop. It seems like there should be, but I don't know what it is. Any help would be appreciated. Thanks.

[code];--------------------------------------------------------------;COUNT THE TOTAL NUMBER OF BITS THAT ARE SET IN A WORD OR DWORD;Inputs: AX/EAX = Word/DWord to Test;Outputs: CL = Number of Bits set in the Word/Dword; ZF = Set if CL = 0; = Clear if CL != 0;Changes:;--------------------------------------------------------------CountAXBitsSet: PUSH EAX ;Save used registers AND EAX,0000_FFFFh ;Get rid of high word CALL CountEAXBitsSet ;Count the bits POP EAX ;Restore used registers RET

totally unoptimized and dirty, but appeared to work.this at least doesn't have to go through the entire 32bits if the upper bits are already 0. As soon as it sees the last 1 bit, it bails out.

-jeff!

: I'm working on a program where I need to count the number of bits : that are set in a Word (AX) or DWord (EAX). Below is the code I'm : using now. I'm just wondering if there is some special CPU : instruction(s) to do something like this, instead of needing to do a : loop. It seems like there should be, but I don't know what it is. : Any help would be appreciated. Thanks.: : [code]: : ;--------------------------------------------------------------: ;COUNT THE TOTAL NUMBER OF BITS THAT ARE SET IN A WORD OR DWORD: ;Inputs: AX/EAX = Word/DWord to Test: ;Outputs: CL = Number of Bits set in the Word/Dword: ; ZF = Set if CL = 0: ; = Clear if CL != 0: ;Changes:: ;--------------------------------------------------------------: CountAXBitsSet:: PUSH EAX ;Save used registers: AND EAX,0000_FFFFh ;Get rid of high word: CALL CountEAXBitsSet ;Count the bits: POP EAX ;Restore used registers: RET: : CountEAXBitsSet:: PUSH BX ;Save used registers: XOR CL,CL ;Start counter at 0: MOV BL,32 ;Need to test 32 bits: S10: ;Loop to here for each bit: ROL EAX,1 ;High bit set?: JNC >S30 ;If not, keep testing: INC CL ;If so, increment the bit set counter: S30: ;Counter incremented, if appropriate: DEC BL ;Decrement loop counter: JNZ S10 ;If not done yet, keep testing: OR CL,CL ;Set return flag: POP BX ;Restore used registers: RET: [/code]:

: totally unoptimized and dirty, but appeared to work.: this at least doesn't have to go through the entire 32bits if the : upper bits are already 0. As soon as it sees the last 1 bit, it : bails out.:

Using ADC *might* be faster than the jmp's. Don't know for sure though.

: : totally unoptimized and dirty, but appeared to work.: : this at least doesn't have to go through the entire 32bits if the : : upper bits are already 0. As soon as it sees the last 1 bit, it : : bails out.: : : : Using ADC *might* be faster than the jmp's. Don't know for sure : though.: : Best Regards,: Richard: : The way I see it... Well, it's all pretty blurry

Both possibly helpful tips (exiting the loop early and using ADC). The link is interesting as well (very tricky). I'm not sure how the code at the link would perform compared to an exit-early loop, though (depending on the exact bit pattern, of course). Also, as a rule I care more about size than I do speed, and the code on the link is quite long.

I was thinking there might be a way to use some combination of XOR/SHx/ROx, or BCD conversion instructions (AAD/AAM/DAA/DAS), or LEA, or some other "trick" that I haven't thought of, to do it. Even if the "trick" only worked on a nibble or a byte at a time, it could reduce the number of times through the loop. The code at the link is sort of what I was thinking of (it breaks things up into small pieces and then puts them back together again), but I was hoping for something smaller in size.

: : : totally unoptimized and dirty, but appeared to work.: : : this at least doesn't have to go through the entire 32bits if the : : : upper bits are already 0. As soon as it sees the last 1 bit, it : : : bails out.: : : : : : : Using ADC *might* be faster than the jmp's. Don't know for sure : : though.: : : : Best Regards,: : Richard: : : : The way I see it... Well, it's all pretty blurry: : Both possibly helpful tips (exiting the loop early and using ADC). : The link is interesting as well (very tricky). I'm not sure how the : code at the link would perform compared to an exit-early loop, : though (depending on the exact bit pattern, of course). Also, as a : rule I care more about size than I do speed, and the code on the : link is quite long.: : I was thinking there might be a way to use some combination of : XOR/SHx/ROx, or BCD conversion instructions (AAD/AAM/DAA/DAS), or : LEA, or some other "trick" that I haven't thought of, to do it. : Even if the "trick" only worked on a nibble or a byte at a time, it : could reduce the number of times through the loop. The code at the : link is sort of what I was thinking of (it breaks things up into : small pieces and then puts them back together again), but I was : hoping for something smaller in size.: : Thanks for the ideas.

I did a little more searching, and found a piece of C code that helped me figure out a good way to do it:

[code]// v = Variable to Test// c = # of bits set in vfor (c = 0; v ; c++) v &= (v-1);[/code]The number of times it does the loop is proportional to the number of bits set. Here the ASM code I "translated" this into:

[code];--------------------------------------------------------------;COUNT THE TOTAL NUMBER OF BITS THAT ARE SET IN A DWORD;Inputs: EAX = DWord to Test;Outputs: CL = Number of Bits set in the Dword; ZF = Set if CL = 0; = Clear if CL != 0;Changes:;--------------------------------------------------------------CountEAXBitsSet: PUSH EAX,EBX ;Save used registers XOR CL,CL ;Initialize Counter OR EAX,EAX ;Are there any bits set? JZ >S90 ;If not, we're doneS10: ;Loop to here for each bits set INC CL ;Increment Bit Counter MOV EBX,EAX ;Subtract 1 from the DEC EBX ; current value AND EAX,EBX ;Mask out the smallest set bit JNZ S10 ;If there are still bits set, keep goingS90: ;Done OR CL,CL ;Set return flag POP EBX,EAX ;Restore used registers RET[/code]