It’s worth keeping in mind that an IORef a not a mutable a, it a mutable reference to an immutable object of type a - a possibly fairer comparison would be to use volatile int * ref which dereferences the value increments it, allocates a new location for this incremented int, writes the value to this new location, then writes to the pointer pointing to that new location. Also the most appropriate comparison would be not using mutation at all, and using sum [1..10000], which should actually produce code much closer to what gcc would in the for loop, and if passed to LLVM would actually just compute the constant.

Using IORef actually prevents the compiler doing many many optimisations, because it has to maintain the correct semantics, including the ability to implement atomicModifyIORefM an incredibly useful function which allows any immutable data structure to be modified concurrently without contention. After trying many alternative implementations for a concurrent linked list, it was found by far the most efficient was standard Haskell lists wrapped in an IORef: https://pdfs.semanticscholar.org/2f9e/cc815906c4359cb02674123a1c3b06ec735b.pdf

I’ve always been extremely impressed with Haskell’s Vector package. It does a remarkably good job of producing tight assembly code from fairly abstract, compositional programs. Under the hood, it uses a cool streaming framework that lets it optimize away a lot of vector construction/destruction and do things in-place. It’s a great example of a high-performance library that doesn’t leak implementation gunk to the user.

Also note that you can do mutable vector operations safely without using IO; mutable vectors also work in the ST monad, which was in an article here recently.