unsafePerformIO and missing NOINLINE

Two months ago Ivan asked me if we had working darcs-2.8 for ghc-7.8 in gentoo. We had a workaround to compile darcs to that day, but darcs did not work reliably. Sometimes it needed 2-3 attempts to pull a repository.

A bit later I’ve decided to actually look at failure case (Issued on darcs bugtracker) and do something about it. My idea to debug the mystery was simple: to reproduce the difference on the same source for ghc-7.6/7.8 and start plugging debug info unless difference I can understand will pop up.

Darcs has great debug-verbose option for most of commands. I used debugMessage function to litter code with more debugging statements unless complete horrible image would emerge.

As you can see in bugtracker issue I posted there various intermediate points of what I thought went wrong (don’t expect those comments to have much sense).

The immediate consequence of a breakage was file overwrite of partially downloaded file. The event timeline looked simple:

darcs scheduled for download the same file twice (two jobs in download queue)

Thus first I’ve decided to fix the consequence. It did not fix problems completely, sometimes darcs pull complained about remote repositories still being broken (missing files), but it made errors saner (only remote side was allegedly at fault).

But, OK. Then i’ve started digging why 7.6/7.8 request download patterns were so severely different. At first I thought of new IO manager being a cause of difference. The paper says it fixed haskell thread scheduling issue (paper is nice even for leisure reading!):

GHC’s RTS had a bug inwhich yield
placed the thread back on the front of the run queue. This bug
was uncovered by our use of yield
which requires that the thread
be placed at the end of the run queue

Thus I was expecting the bug from this side.

Then being determined to dig A Lot in darcs source code I’ve decided to disable optimizations (-O0) to speedup rebuilds. And, the bug has vanished.

That made the click: unsafePerformIO might be the real problem. I’ve grepped for all unsafePerformIO instances and examined all definition sites.

Some of those (esp. ajhc) may not actually be bugs. There are cases in the ghc and base sources where ‘{-# PRAGMA …’ was intentionally changed to ‘{- PRAGMA’ because using the pragma in that manner ended up being a pessimisation. It may not be the best practice but it has happened.