| main = print $ foldl' (+) 0 [1..]
|| with
|| foldl' f y xs = foldl' y xs
| where foldl' y [] = y
| foldl' y (x:xs) = foldl' (f y x) xs
|| runs indefinitely with very little memory consumption, while
|| foldl' f y [] = y
| foldl' f y (x:xs) = foldl' f (f y x) xs
|| rapidly consumes all the machine's memory and dies.
Others have explained this nicely. But there's a real tension here. The fast version comes from a combination of (a) the static argument transformation, so you get the first version above, and (b) bodily inlining the entire function, so that at *each call site* you get a locally-recursive function where 'f' is known. That's ok for small functions, but not so good for big ones. Furthermore, the code duplication is only worthwhile if the specialisation is truly useful. For example, would it be better to write append like this
(++) xs ys = letrec app [] = ys
app (x:xs) = x : app xs
in app xs
and inline that at every call of (++)? Probably not.
So that is why GHC does not automate this transformation. If you know that's what you want, write a local recursion, and use an INLINE pragma.
If someone felt like summarising this thread on the Haskell performance-advice wiki that would be great.
http://haskell.org/haskellwiki/Performance
Meanwhile, I'll clarify in the user manual that recursive functions are not inlined.
Simon