The processor has an internal stack of return addresses from CALLinstructions. This stack holds upto 4 addresses, and is used to predictthe destination address of a RET so that instruction prefetch can happenearlier. If the prediction misses, then the penalty is about a dozencycles on a Pentium Pro, Pentium II, Celeron, Xeon, or any Intel x86processor with "dynamic execution". The penalty on a Pentium is aboutfive cycles. Thus, the penalty is about the same as a pipeline flush, andon a dynamic execution processor it is also about the same as a cachemiss. The first mis-predicted RET could cause all RETs then in theinternal stack to be mis-predicted (the easy implementation). So, becareful in high-frequency areas. But if there will be a context switch,or if the depth is now only 1, or will exceed 4 before the RET, thendefinitely do use the PUSHL+JMP instead of CALL+JMP.