CPU is usually the last place you look for io performance bottlenecks, pining vcpus might be a good idea though to isolate things. I/O is complex and isn't direct as many think. It functions as a writeback system and is heavily dependent on how much ram you have, the width of your memory buses, and how swappiness is tuned among a host of other tunables. There's no silver bullet or "flat rate" solution, it usually isn't the obvious symptom. Things like linux-perf and systemtap can help diagnose where the bottleneck is and identify the root cause. Good luck.

You can make all the assumptions you want, until you start testing them you'll never get to the bottom of it. Yes, it can be that bad as you're putting a block device at the mercy of the file cache. iostat, vmstat, and blktrace are your friends. Or just create another VM with direct access and compare performance. This is work, and you won't get to the bottom of it unless you help yourself.
–
ppetrakiOct 10 '12 at 13:57

Not as safe is a massive understatement, you traded speed for data integrity. No one runs with writeback mode in production. Good Luck.
–
ppetrakiOct 17 '12 at 13:33

According to the link you provided writeback should be safe as well when using barriers...
–
AlexOct 18 '12 at 6:55

it's "safer" but not the same as writethrough, if you suffer a fault between barriers that data is gone. writeback systems are usually backed up by RAIDs with BBC (battery backed cache) or UPS. Barriers aren't magical.
–
ppetrakiOct 18 '12 at 17:11