Thread.report_on_exception should be true by default

Extracted from #6647 to focus on the default value now that the feature is implemented.

I strongly believe we should have Thread.report_on_exception = true by default.

It only adds some extra stderr output for apps which let threads die, which is very rarely intended.
If it is intended, then one can use Thread.current.report_on_exception = false
to clarify it's OK for that thread to die and the failure is handled by the app on Thread#join.

I enabled Thread.report_on_exception=true by default in ruby/spec, see https://github.com/ruby/spec/pull/517,
the only cases needing Thread.current.report_on_exception=false
are the specs testing report_on_exception itself and Thread#join/value/status/raise.

Enabling it for test-all shows a fair amount of extra output and failures, which I would bet some of them are bugs in the tests (I already found one, r60854 & r60870),
and other tests should simply more carefully test what they expect
(for instance assert_raise() inside the Thread just around the code raising an exception and join the Thread).

I am willing to help to reduce the extra output and failures in test-all,
but I would like a OK from Matz to try enabling Thread.report_on_exception by default.

Dear Matz, do you think it is reasonable to show exceptions killing threads on stderr by default,
instead of silently swallowing them until Thread#join ?
(if there is ever a Thread#join ..., often not or too late, when the rest of the application has crashed)

History

I think it is time to do this, and most rubyists using threads with Ruby seem to agree with this
(see #6647 and I can say at least JRuby & TruffleRuby implementers want this).

Other threads should write a backtrace just like the main thread when they die due to an exception,
and not hope there will be a Thread#join soon enough.
Threads are heavyweight in Ruby, they are rarely if ever used as an isolation mechanism,
and we cannot assume Thread#join will always be called.
Proper exception handling in threads should anyway not rely on Thread#join semantics
but deal with the exception and communicate the failure with other threads explicitly.

-d/--debug is not a good fit here because:
* it changes the semantics to abort_on_exception.
* it outputs a lot of extra exceptions (in bigger programs), yet without showing the backtrace of exceptions.
* it does not differentiate between exceptions which killed a Thread and exceptions which are rescue'd.

report-on-GC as proposed in #6647 is of no use here since the Thread might be GC-reachable long after it dies,
and it adds extra non-determinism which has no place in error reporting.

Perhaps "on by default" would be more palatable to people if we could also specify how to handle these unhandled exceptions?

In Java, all threads will report a bubbled-out exception if you do not specify a handler for those exceptions. The handler can be set on a global basis or per-thread. So basically, the default is that bubbled exceptions get fed to a default handler that reports thread death.

Interestingly, this is named "safe_thread", probably alluding to the fact Thread.new is unsafe/dangerous as it just ignores errors silently.
This would be much nicer with Thread.on_exception(&handler).
However, I think Thread.on_exception should be a separate issue, extending on good defaults from this issue (then the default handler would just be to print to stderr).

What matters most to me and many rubyists is that by default Ruby Threads do not die silently,
and instead the programmer is given a good clue when a part of the program raises an error and it's never handled.
Thread.report_on_exception = true by default would mean, with no extra code, there is already a good handling of exceptions in threads.

Nobody should have to remember to always add Thread.report_on_exception = true or Thread.abort_on_exception = true
on every Ruby program using Thread, or otherwise waste a lot of debugging time.

I am telling this from experience: I am a PhD student working for 3 years on Ruby & concurrency, as well as a test suite maintainer.

I have too little experience with threads to meaningfully comment on this.

But I have used Thread-methods before, in particular in ruby-gtk.

I distinctly remember having had to set:

Thread.abort_on_exception = true

in some of the .rb files.

I do not remember 100% as to why but I believe it was because if I
would not do so, I would not be able to see which error caused the
ruby GUI app to crash (this started my investigation there, by the
way, since I saw crashes that I did not understand). Or the error
was somewhere hidden. Ruby-GTK can really generate very long core
dumps or exception messages. :)

For short code examples, it was simple to see what was going on but
I found that the more complex the widgets become, the harder it is
to debug them (another reason why I try to write as simple as code
as possible, when I can get away with it).

After I did set Thread to abort on exceptions to true, and read up
on Threads, I understand it to some extent so that I can use it just
fine ... or use it in a "useful way". But initially when I first
noticed problems, I had no real idea what was going on. It should
also be said that the ruby-gtk stuff uses some kind of internal
main loop as well, which I think has somehow to do again with
threads (in glib; my apologies for this bad description here but
I really do not know much about threads; I only know that glib
has its own loop e. g.:

GLib::MainLoop.new(GLib::MainContext.default, true)

)

So to me, that first initial step, to find out how to deal with
threads in Ruby, was not trivial to find out. In that context
I agree with Benoit's comment "Nobody should have to remember"
which I think is a fine statement, in particular by people who
are new to ruby. It's not so difficult for more experienced,
older ruby people but for newcomers, making threads as simple
as possible would be good, IMO. (I still have not really looked
into fibers and mutexes ... it all seems to be a huge chunk of
domain-specific knowledge here to have to learn :D ).

So to me, that first initial step, to find out how to deal with
threads in Ruby, was not trivial to find out. In that context
I agree with Benoit's comment "Nobody should have to remember"
which I think is a fine statement, in particular by people who
are new to ruby. It's not so difficult for more experienced,
older ruby people but for newcomers, making threads as simple
as possible would be good, IMO.

I think it is also a problem even for experienced rubyists.
I often forget to add Thread.{report or abort}_on_exception = true.
It is not easy to remember that every Thread.new{} has to use these extra lines at the beginning
or threads might crash silently with no easy way to debug.

headius (Charles Nutter) was saying on IRC #jruby yesterday that he experiences this problem at least once a month.enebo (Thomas Enebo) was talking about a similar scenario than yours:
GUI programming with threads in Ruby is harder than it should because errors go unnoticed.

I think clarification on what the semantics should be would be helpful.

Should ANY exception raised in a Thread be considered a reasonable way of ending a thread? Yes or No

Yes - You execute your program and it does not work right. Wondering why after doing weird stuff like method ensure blocks to print out the backtrace (yes I did that at one time) you find: Thread.abort_on_exception = true. Once enabling and re-running (assuming it is predictable) you realize you made a typo and a NoMethodError killed your thread.

No - When your program does not work right you see a stack trace on a thread which died from an exception raise.

In No, the semantics is there is no reasonable exception to be thrown which will not generate a backtrace, but compared to Yes we do not get hidden death. Hidden death makes developers frustrated and until they discover abort_on_exception they wonder why this is like this. Even after they learn this they still wonder why this is like this... :)

Besides asking a thread to kill we could always add a special exception which allows a thread to go away silently. I personally think that is too much though.

I think it should return true for consistency (even though it doesn't use the same mechanism for reporting errors, it does still reports exceptions finishing its execution).
I'll fix that detail tomorrow.

The address seems of little use beyond identifying which Thread is is, which we could use a simpler numbering, or simply skip leading zeros.
We could also show an extra backtrace line for Thread.new, so there is no need to show the source location on the first line: