global_method_cache should be configurable or grow automatically

The global_method_cache is currently a fixed 2048 entries. This is not configurable, and can be inadequate for large applications with thousands of classes and methods.

In our app, increasing the size of the cache to 32768 entries reduced time spent in search_method and overall pressure on st_lookup:

before
420 14.0% st_lookup
182 6.1% vm_search_method (inline)

after
265 9.5% st_lookup
125 4.5% vm_search_method (inline)

It's possible the VM could grow global_method_cache automatically, using some heuristic based on the number of long-lived classes or method_entries that are defined.
However this would break the hashing in the current implementation.

Add couple of fixes to patch:
1. fix rb_mcache_resize - it didn't copy method_state and class_serial on resize, so that cache were invalidated on next check.
2. add "gargabe collection" of "undefined" cache entries - do not copy them on resize, shrink cache size if it is too sparse after resize.

funny_falcon's approach is very impressive to solve this problem.
However, now I'm not sure how it increase memory imapct.

tmm1 measured the memory impact onhttps://bugs.ruby-lang.org/issues/9262#change-43840 . This survery is
very impressive. I think it is better we measure other cases. For
example, if some classes calls many methods at onece, it will memory
issue. I think the simple limitation (cap) approach with funny_falcon's
patch works fine.

##

New years holiday (Japanese take holidays in new years week), I'm
thinking about this issue. Some ideas are available.

Now the above ideas are not implemented/verified. And huge effort is
needed (because we need to change the method entry data structure).
Before the try, I need to know the why and how method cache is missed.

##

Basically I don't against to introduce and backport some proposed
patches (w/ measurement, if we can). In my opinion, simple variable
global cache entry size approach will fine for backport.

And also I try above ideas for Ruby 2.2. Current patches are good
starting point, I think.

It looks like the performance regressions w/o global method cache are
because rb_funcall and friends do not have call info, so they don't
hit the inline cache. So perhaps we should just add a call info-aware
version of rb_funcall-like functions so we can just use inline cache
everywhere.

I'm pretty sure ko1 already knows that, but I just discovered it :x
tmm1: what do you think?

It looks like the performance regressions w/o global method cache are
because rb_funcall and friends do not have call info, so they don't
hit the inline cache. So perhaps we should just add a call info-aware
version of rb_funcall-like functions so we can just use inline cache
everywhere.

I'm pretty sure ko1 already knows that, but I just discovered it :x
tmm1: what do you think?

charliesome have proposed a simiar API (sorry I forget the URL).
He use only static variable (not thread-local) and it seems works well.
However, I think it may have pitfalls (recursive call) so I can't decide
to introduce it.