(@nalsh on Twitter points out that you can use tool/bisect.sh in the Ruby source if you're bisecting on a test that Ruby already has. It's even easier!)

I have the Ruby code checked out locally. I'm running the bug-reporter's example test, an "nth prime" example from Exercism.io. It succeeds on Ruby 2.2, and fails with an obscure "deadlock; recursive locking" error on the very latest Ruby (currently 2.4, commit d16ff5 in the GitHub mirror.)

To check if a given Ruby version is good or bad, I'm rebuilding Ruby and running the test via this incantation: "make && ./ruby -v -Ilib -I. -I.ext/x86\_64-darwin15 ~/Desktop/nth_prime_test.rb". The extra arguments are to use local libraries from the Ruby build directory on my Macbook. It'll be a little different if you're using Windows or Linux.

(Here's a time-saver of a tip: check your incantation by manually switching back and forth between the first "good" and "bad" revisions - that way you'll be sure that you're rebuilding the right things to get the bug after you weren't getting it, and vice-versa. That's very important when you're trying to isolate it. And checking all the revisions when your test doesn't work will waste your time.)

I started my "git bisect" on Ruby's git history:

$ git bisect start trunk # Trunk is bad, current (ruby_2_2) is good
$ make && ./ruby #... # Run the tests
$ git bisect good

After each "git bisect" command I make Ruby locally and run my test again. Then I either type "git bisect good" or "git bisect bad", depending whether I got the nasty deadlock error that I'm tracking down.

$ git bisect good
Bisecting: a merge base must be tested
[7632a82d5750b7908bd173eda3268ecb0855b934] * gc.c (wmap_final_func): fix memory size shortage when realloc wmap. Fix SEGV during finilize of WeakRef on Solaris (though the SEGV could occur on all OS/platforms). [ruby-dev:48779] [Bug #10646]
$ make && ./ruby #... # Run the tests again

Git bisect, if you don't know it already, will look at the whole list of commits between your "good" and "bad" revision, pick one in the middle, and then let you test it to see if it's "good" or "bad". Each time you tell it, it will cut the range of commits in half again.

If you hit a different bug, you can "git bisect skip" to say that that commit can't be tested... It'll take a little longer to find the bug because you had to rebuild Ruby just to find out that one commit wasn't good. But when you need it, use it. Also, there's always a weird tension about how much you rebuild. Do you "git clean -dxf && autoconf && ./configure &&..." (aka nuke it from orbit and redo everything)? Do you just remake as little as possible? This is one reason that it can be touch to just hand it a script and have it do the right thing, unless your script does a full nuke-and-rebuild, which can be very slow. I'm sure there's also a smart way to not rebuild rdoc every time, but I'm still getting the hang of lots of Ruby rebuilding ;-)

WhoDunnit

Did you read the original bug? Wondering why innocent-looking code having nothing obvious to do with threads is getting a bizarre mutex error with some Ruby versions?

The Mutex exists because the example code uses Prime, which is a Singleton, which locks itself when getting the single, global instance of Prime that exists. The commit that git-bisect finds looks relevant: commit d2487e removes an obsolete Prime.new operation. It also switches Prime to be a Singleton. The old version of Prime used an internal @the_instance variable to track itself. The new ones uses Singleton, and a mutex, and the code where it gets that deadlock error. Ah!

The deadlock error on the mutex basically means "this mutex is already locked by you and you're locking it again" -- which can never succeed. Frequently it's caused by... recursion.

In this case, Prime got rid of its .new() method. The Singleton class, in Ruby, makes its instance using .new(). Worse, the sample code winds up defining .new() as well. So when it calls .new(), Prime winds up calling .instance(), which calls .new(), which calls .instance()...

And so it looks like the problem is just the code in question defining a .new() method. It's technically a regression, but is it one that matters? Hm...