Yeah I didn't look at your output. The thing with softirq is that it does all that irq "tempering" stuff (like coalescing and so on). It manages IRQ per CPU (you'll have a softirq per CPU).

It gets involved heavily in I/O - specifically networking and (although I'm not 100% sure) disk I/O. Have a look at those things and see if they are substantially different to the other physical nodes running the same software, may indicate problems in those areas (rather than interrupts per se).

Not I/O then? In that case what else could generate such HUGE (you say) of interrupts. The next prime candidate is your virtualisation layer. Unlikely to be Xen, but what are the guests doing? Perhaps CPU/interrupt usage per guest?

I don't know thats why I'm asking! I don't think it is I/O as read/writes appear to be fine, we have had other servers where read/writes are very slow but still hasn't caused this type of interrupts. I don't know whether it is huge as such but it shows in top as using 100~% of the cpu most of the time. The guests will be doing various things, they are used by customers, but xen should normally be isolated so it isn't something we have really come across before.

You've checked the hardware, firmware aqnd device drivers and it's all okay? Beyond that, not much to go on. It may be that "hard" interrupts are happening faster than the hardware can keep up and passing it onto softirq (although the interrupts can still overwhelm softirq, obviously). In top, there's the CPU status lines (on SMP kernels one per CPU), you'll see the hi, si and st counters. hi : time spent servicing hardware interrupts, si : time spent servicing software interrupts, st : time stolen from this vm by the hypervisor. If you have high values for hi and si then it's probably a hardware problem.

Given you seem to be on a "larger" SMP system (what 32 cores?) Perhaps the issue is with affinity - in the sense that servicing is "moving" between cores and never completing. There is a smp_affinity variable for each process, but that's not really scalable (nor easy).

Looking at the interrupts you posted, we see the lines with larger values across all cores are meagsas (storage), em1 (network) and rescheduling interrupts (although the posted output is not in a really good format and I'm often cross-eyed). Perhaps you could compare those values to your baseline of your system and/or the other systems (with the same hardware) that are working well?

I guess in your position I'd load that data into (something like a) spreadsheet, sort per CPU and get the top (say 5-10) across each CPU. Whatever comes out on top is likely to be the source of the problem.

thank you all for your responses. I was getting the same problem of high cpu usage. did found some seemingly good responses that should be helpful. at least i hope so. thanks! i started to get headaches because of this problem...or maybe it is because i started to take hgh? gonna start searching for hgh reviews and see what might be the problem