Thank you for pointing out this problem. I have updated
my patch accordingly and am reattaching it to this issue.

Here is my approach for solving this problem:
(Please correct me if I am wrong.)

Since Ruby 1.9 threads are native kernel threads, they
dynamically allocate and manage their own stacks. So
the ruby_bind_stack() GC marking restriction must only
be applied to the main Ruby thread---which isn't really
a thread at all; it runs on the native C program stack.

=begin
I'm embedding a Ruby 1.9.1 in my app, and it would die with segmentation fault, and I was suspicious about the stack, as it's multithreaded. I applied your patch and it looks fine so far. I'm using the following code (and assume that stack grows upward):

/*
* Binds the stack of Ruby's main thread to the region of memory that spans
* inclusively from the given lower boundary to the given upper boundary:
*
* lower boundary <= stack pointer of Ruby's main thread <= upper boundary
*
* These boundaries do not protect Ruby's main thread against stack
* overflow and they do not apply to non-main Ruby threads (whose stacks
* are dynamically allocated and managed by the native Operating System).
*/
void ruby_bind_stack(void *lower_boundary, void *upper_boundary);

According to Matz's suggestion in , I wrote
a detailed explanation of the problem this patch solves. I
hope this explanation is helpful. Please do not hesitate
to ask for clarifications or to correct any misunderstandings.

Thanks for your thoughtful consideration.

== Introduction

The patch adds a ruby_bind_stack() function to the Ruby C API.
This function allows the person who is embedding Ruby to
tell the Ruby GC about the stack boundaries of the embedded
environment:

void ruby_bind_stack(VALUE *lower_bound, VALUE *upper_bound);

In order to understand why this function is important, please
consider the following two modes of operation: normal & embedded.

== Normal operation: Ruby runs in a C program's main()

Initially, Ruby assumes that the stack of Ruby's main
thread exists in a high memory address range, like this:

(high memory address)

0xc1bff1f0 Ruby's stack upper boundary

0xbffff1f0 Ruby's stack lower boundary

(low memory address)

As Ruby runs, the lower boundary is adjusted (by the
SET_STACK_END macro) to reflect the machine stack pointer:

See the problem? Ruby's stack and the C coroutine stack
do not agree. They overlap!

This situation becomes worse (and causes a segfault) when
the Ruby GC runs: it marks VALUEs in the Ruby stack, which
currently contains all of the heap memory! Somewhere in
the vast heap memory, it finds and dereferences a NULL value
and BOOM! a segfault occurs. :-)

To solve this problem, the ruby_bind_stack() function corrects
Ruby's stack to reflect the stack boundaries of the C coroutine:

I would like to say that without applying this patch, my ruby interpreter, embedded in a pthread, would cause a segmentation fault as soon as GC was invoked. I would like to see this applied to 1.9.1 as well as http://redmine.ruby-lang.org/issues/show/2279 . Without these, it's hardly possible to have ruby 1.9.1 embedded in a useful way.

According to Matz's suggestion in , I wrote
a detailed explanation of the problem this patch solves. I
hope this explanation is helpful. Please do not hesitate
to ask for clarifications or to correct any misunderstandings.

Thanks for your thoughtful consideration.

== Introduction

The patch adds a ruby_bind_stack() function to the Ruby C API.
This function allows the person who is embedding Ruby to
tell the Ruby GC about the stack boundaries of the embedded
environment:

void ruby_bind_stack(VALUE *lower_bound, VALUE *upper_bound);

In order to understand why this function is important, please
consider the following two modes of operation: normal & embedded.

== Normal operation: Ruby runs in a C program's main()

Initially, Ruby assumes that the stack of Ruby's main
thread exists in a high memory address range, like this:

(high memory address)

0xc1bff1f0 Ruby's stack upper boundary

0xbffff1f0 Ruby's stack lower boundary

(low memory address)

As Ruby runs, the lower boundary is adjusted (by the
SET_STACK_END macro) to reflect the machine stack pointer:

See the problem? Ruby's stack and the C coroutine stack
do not agree. They overlap!

This situation becomes worse (and causes a segfault) when
the Ruby GC runs: it marks VALUEs in the Ruby stack, which
currently contains all of the heap memory! Somewhere in
the vast heap memory, it finds and dereferences a NULL value
and BOOM! a segfault occurs. :-)

To solve this problem, the ruby_bind_stack() function corrects
Ruby's stack to reflect the stack boundaries of the C coroutine:

Sorry to be impatient, but has there been any
further decision or consideration about this patch?

Sorry to be late.

The only feedback I've received so far is that:

An early version of this patch did not support
multi-threading (thanks to Mr. Nobu).

A later version of this patch worked for embedding
Ruby 1.9 inside a pthread (thanks to Mr. Roman).

Switching stack using setcontext() can't work on all platforms.
For instance, on NetBSD and older LinuxThread stack address is
tightly bound to thread, and can't be changed. That is, your
strategy is not portable.

Hmm, but you still should not ignore the fact described in
. If the patch solve a serious problem under one
condition (SEGV on embedding environment), you cannot reject it just
saying 'not a good idea'.

Switching stack using setcontext() can't work on all platforms.
For instance, on NetBSD and older LinuxThread stack address is
tightly bound to thread, and can't be changed. That is, your
strategy is not portable.

You are referring only to my System V context example, right? If so,
please note that I also provided a second example that uses libpcl1
which "can use either the ucontext.h functionalities... or the standard
longjmp()/setjmp()" and "is easily portable on almost every Unix system
and on Windows" 1.

I will create a thrid example that uses libpthread to demonstrate
how this patch lets you embed Ruby 1.9 inside a pthread. (Note that this
patch has already allowed Mr. Roman to embed Ruby 1.9 inside a pthread.)

|You are referring only to my System V context example, right? If so,
|please note that I also provided a second example that uses libpcl[1]
|which "can use either the ucontext.h functionalities... or the standard
|longjmp()/setjmp()" and "is easily portable on almost every Unix system
|and on Windows" [1].

As far as I understand, libpcl is under GPL, that cannot be used in
the core Ruby. Since Ruby is not covered by GPL only.

It does not use ucontext, libpcl, pthreads or any other coroutine
libraries. These libraries are only used in the example test cases
I provided to demonstrate how the ruby_bind_stack() function can be
used to embed Ruby inside a coroutine environment:

|> As far as I understand, libpcl is under GPL, that cannot be used in
|> the core Ruby. Since Ruby is not covered by GPL only.
|
|My patch simply adds a ruby_bind_stack() method to the Ruby C API:
|
| ruby_bind_stack_r25604.patch (attached to Feature #2294)

I am sorry about my misunderstanding. In that case, there should not
be any license issue. I'd wait Nobu to express his opinion.

=begin
Encountered this problem while embedding Ruby in a pthread'ed plug-in. I applied the patch against r25604 but unfortunately it did not solve the problem. When pthread_main_np() == 0 it crashes. When pthread_main_np() != 0 it works.

I will be ahppy to help test this and provide more information if needed. Thanks.

Perhaps I wishfully believed your patch to be the needed solution, but what I'm seeing is definitely occurring when pthread_main_np() == 0.

With 1.9.1-p243 I was seeing random problems that seemed stack related. After searching through the reported issues, I decided to try r25604 and that's when the error became exactly, and consistently, what you described in #2258. That's how I found this patch.

After looking at your examples I don't think they mirror my situation and would need more than a modification to make them do so. Unfortunately I will not have the time for building a test case that mimics my situation accurately before another week or two.

I'm not sure this makes any difference really, but in my case Ruby is embedded inside a dynamically loadable plug-in. The target host program comes in two editions; a client that runs its plug-ins from the main thread, and a server that runs them in a child thread. The client works fine while the server crashes on the first call to require. To create an accurate test I think I need to reproduce this exact situation. I'm not sure though, I have to investigate this when time allows.

Please try r25842 or newer (with and without my patch) and see if it solves
your problem. That particular revision solves the "[BUG] object allocation
during garbage collection phase" error (reported in #2258) you encountered.

Hopefully, yours will turn out to be an unrelated issue so I can make
forward progress on getting this patch accepted (someday!!). :-)

Could we finally get this patch commited, please? It's not like it's a thousand line behemoth and it solves a very real problem - it's impossible to embed Ruby into a pthread without it. I really see no reason not to commit this.
=end

|Could we finally get this patch commited, please? It's not like it's a thousand line behemoth and it solves a very real problem - it's impossible to embed Ruby into a pthread without it. I really see no reason not to commit this.
|http://redmine.ruby-lang.org/issues/show/2294

|Could we finally get this patch commited, please? It's not like it's a thousand line behemoth and it solves a very real problem - it's impossible to embed Ruby into a pthread without it. I really see no reason not to commit this.
|http://redmine.ruby-lang.org/issues/show/2294

As nobu said at first, this patch is not considering the
multi-threading.
(and using global variables should not be accepted :) The patch is too
ad-hoc modification)

Thanks for your feedback! I must confess that I did not really
understand how my patch did not support multi-threading, but after
reading your proposed API, I finally understand what Nobu was talking
about. :)

I agree with your feeling and I would like to follow your proposed API.

Thanks for your feedback! I must confess that I did not really
understand how my patch did not support multi-threading, but after
reading your proposed API, I finally understand what Nobu was talking
about. :)

I agree with your feeling and I would like to follow your proposed API.

Could you give me a concrete example? (Execution flow)
(I'm sorry if I missed the example you already posted)

Hi
While looking for a solution to an issue we have with embedding ruby into a pthread I found this thread
We have been running ruby embedded into a pthread for about 10 years now and and have upgraded periodically. We recently moved from 1.8.7 to 1.9.3 and now experience a segv immediately rb_gc is called in mark_locations_array. The symptoms resemble those described above so we applied Suraj's patch and it solves the issue. We only have one ruby instance running and do not use ruby threads within the ruby code.

Could I respectfully ask what the status of this patch is, or what plans, if any, there are to solve this problem? Ruby has proven to be extremely powerful within our applications and we are keen to maintain currency and would prefer, for obvious reasons, not to have to patch the code.

Also - what changed between 1.8.7 and 1.9.3 to cause this issue to appear?

I can reliably reproduce the bug in my embedded Ruby project (https://github.com/kkaempf/cmpi-bindings) where Ruby (ruby-1.9.3-p194; same for git master) segfaults in mark_locations_array() accessing a memory location within ruby_stack_lower_bound and ruby_stack_upper_bound.

I can also verify that applying the ruby_bind_stack patch fixes the issue.