What causes a STKOVF - VMS

This is a discussion on What causes a STKOVF - VMS ; The following involves OpenVMS V8.3A with various ECOs applied. It
happens in a DECwindows app. I'd have to find out hardware &amp; firmware
details, if necessary.
Okay, I've read the OpenVMS FAQ and the pertinent portions of Hoff's
Wizard stuff ...

What causes a STKOVF

The following involves OpenVMS V8.3A with various ECOs applied. It
happens in a DECwindows app. I'd have to find out hardware & firmware
details, if necessary.

Okay, I've read the OpenVMS FAQ and the pertinent portions of Hoff's
Wizard stuff on stack overflow exceptions, but most of them simply say
"get a reproducer and talk with the support center."

First off, I don't know how to reproduce the particular stack overflow
we're seeing (it's in a large multiprocess system with external inputs
out the wazoo); I'm not even sure I know if it was operator actions
that caused it or the external inputs. The code for just the one
process that got the SS$_STKOVF exception is huge.

What I need is an idea of when the RTL or lower layers detect a "stack
overflow" in a non-threaded situation (though it's a DECwindows app,
just in case it's multithreading for some reason), so I might get some
idea of where to look. (The place where the exception happened didn't
reveal too much.)

It sounds like the RTL actually pre-checks for a case where pushing n
bytes onto the stack would cause the stack to overflow the thread's
stated stack size (in a multithreaded app), and signal SS$_STKOVF
before it goes off the deep end. What does it do for a single threaded
app?

Re: What causes a STKOVF

Joe,

can you create a process dump ? SET PROC/DUMP before running the image
or use RUN/DUMP for a detached process. Then you have the complete
process address space (including the stack) available for analysis
with ANAL/PROC.

Volker.

Re: What causes a STKOVF

On Dec 7, 10:36 am, Joe Sewell wrote:
> The following involves OpenVMS V8.3A with various ECOs applied. It
> happens in a DECwindows app. I'd have to find out hardware & firmware
> details, if necessary.
>
> Okay, I've read the OpenVMS FAQ and the pertinent portions of Hoff's
> Wizard stuff on stack overflow exceptions, but most of them simply say
> "get a reproducer and talk with the support center."
>
> First off, I don't know how to reproduce the particular stack overflow
> we're seeing (it's in a large multiprocess system with external inputs
> out the wazoo); I'm not even sure I know if it was operator actions
> that caused it or the external inputs. The code for just the one
> process that got the SS$_STKOVF exception is huge.
>
> What I need is an idea of when the RTL or lower layers detect a "stack
> overflow" in a non-threaded situation (though it's a DECwindows app,
> just in case it's multithreading for some reason), so I might get some
> idea of where to look. (The place where the exception happened didn't
> reveal too much.)
>
> It sounds like the RTL actually pre-checks for a case where pushing n
> bytes onto the stack would cause the stack to overflow the thread's
> stated stack size (in a multithreaded app), and signal SS$_STKOVF
> before it goes off the deep end. What does it do for a single threaded
> app?

Joe,

First, let me welcome you to posting in COMP.OS.VMS.

The actual details of the stack overflow handling are likely (I do not
have one of my copies handy) in the Internals and Data Structures
manual. The gross details of this have not changed in a VERY long
time.

I would also be concerned about the possibility that someone has
overwitten a saved stack pointer in a call frame, and as a result the
stack is effectively corupt when the RETURN is executed. These can be
devilishly difficult to localize (been there, done that).

There are a variety of strategies that can be used to localize this
type of problem. Which is appropriate depends on many factors. The
most central question is: What (if any) tracking/debugging code is
already present in your application that can help reduce the size of
the search.

Re: What causes a STKOVF

In article <854de914-e03c-4f8b-bf33-dd0df67afae5@a39g2000pre.googlegroups.com>, Bob Gezelter writes:
>
> I would also be concerned about the possibility that someone has
> overwitten a saved stack pointer in a call frame, and as a result the
> stack is effectively corupt when the RETURN is executed. These can be
> devilishly difficult to localize (been there, done that).

One of the first problems I had to debug on my first Alpha was a
return to 0. I'd never seen one on a VAX and it didn't occur to me
that a program running on VMS could do such a stupid thing until I
saw it. What I got first was a last-chance exception handler dump
of registers that didn't point anywhere usefull. (At that point the
process seems to have no stack, so no traceback handler).

A process dump in that case didn't tell me anything I didn't already
know, return to 0 pretty much wiped out pointers to everything
usefull.

I had to run with the debugger many times, doing a binary search for
the line of code that caused the error, and then study the machine
listing to figure out what was going on. (Reading through compiler
generated prolog code is such fun! Made me really miss CALLx/RET!)

Re: What causes a STKOVF

On Dec 7, 11:34 am, Volker Halle wrote:
> Joe,
>
> can you create a process dump ? SET PROC/DUMP before running the image
> or use RUN/DUMP for a detached process. Then you have the complete
> process address space (including the stack) available for analysis
> with ANAL/PROC.
>
> Volker.

In this case, we do have process dumps (an unusual occurrence). I
haven't seen anything that leaps out at me yet.

Re: What causes a STKOVF

On Dec 7, 3:03 pm, Bob Gezelter wrote:
> On Dec 7, 10:36 am, Joe Sewell wrote:
>
>
>
>
>
> > The following involves OpenVMS V8.3A with various ECOs applied. It
> > happens in a DECwindows app. I'd have to find out hardware & firmware
> > details, if necessary.
>
> > Okay, I've read the OpenVMS FAQ and the pertinent portions of Hoff's
> > Wizard stuff on stack overflow exceptions, but most of them simply say
> > "get a reproducer and talk with the support center."
>
> > First off, I don't know how to reproduce the particular stack overflow
> > we're seeing (it's in a large multiprocess system with external inputs
> > out the wazoo); I'm not even sure I know if it was operator actions
> > that caused it or the external inputs. The code for just the one
> > process that got the SS$_STKOVF exception is huge.
>
> > What I need is an idea of when the RTL or lower layers detect a "stack
> > overflow" in a non-threaded situation (though it's a DECwindows app,
> > just in case it's multithreading for some reason), so I might get some
> > idea of where to look. (The place where the exception happened didn't
> > reveal too much.)
>
> > It sounds like the RTL actually pre-checks for a case where pushing n
> > bytes onto the stack would cause the stack to overflow the thread's
> > stated stack size (in a multithreaded app), and signal SS$_STKOVF
> > before it goes off the deep end. What does it do for a single threaded
> > app?
>
> Joe,
>
> First, let me welcome you to posting in COMP.OS.VMS.
>
> The actual details of the stack overflow handling are likely (I do not
> have one of my copies handy) in the Internals and Data Structures
> manual. The gross details of this have not changed in a VERY long
> time.
>
> I would also be concerned about the possibility that someone has
> overwitten a saved stack pointer in a call frame, and as a result the
> stack is effectively corupt when the RETURN is executed. These can be
> devilishly difficult to localize (been there, done that).
>
> There are a variety of strategies that can be used to localize this
> type of problem. Which is appropriate depends on many factors. The
> most central question is: What (if any) tracking/debugging code is
> already present in your application that can help reduce the size of
> the search.
>
> - Bob Gezelter,http://www.rlgsc.com

I've got a 5.5 version handy; wish I had thought of that sooner.
Thanks.

It's possible that something smashed the stack, but all the call
frames look correct otherwise, something that I've found to be rare
when the stack gets puked upon.

Re: What causes a STKOVF

On Dec 7, 4:43 pm, koeh...@eisner.nospam.encompasserve.org (Bob
Koehler) wrote:
> In article <854de914-e03c-4f8b-bf33-dd0df67af...@a39g2000pre.googlegroups.com>, Bob Gezelter writes:
>
>
>
> > I would also be concerned about the possibility that someone has
> > overwitten a saved stack pointer in a call frame, and as a result the
> > stack is effectively corupt when the RETURN is executed. These can be
> > devilishly difficult to localize (been there, done that).
>
> One of the first problems I had to debug on my first Alpha was a
> return to 0. I'd never seen one on a VAX and it didn't occur to me
> that a program running on VMS could do such a stupid thing until I
> saw it. What I got first was a last-chance exception handler dump
> of registers that didn't point anywhere usefull. (At that point the
> process seems to have no stack, so no traceback handler).
>
> A process dump in that case didn't tell me anything I didn't already
> know, return to 0 pretty much wiped out pointers to everything
> usefull.
>
> I had to run with the debugger many times, doing a binary search for
> the line of code that caused the error, and then study the machine
> listing to figure out what was going on. (Reading through compiler
> generated prolog code is such fun! Made me really miss CALLx/RET!)

Been there, done that. The problem is we cannot seem to reproduce this
reliably; all I've got is the afore-mentioned process dump.

Re: What causes a STKOVF

Joe Sewell wrote:
> What I need is an idea of when the RTL or lower layers detect a "stack
> overflow" in a non-threaded situation (though it's a DECwindows app,
> just in case it's multithreading for some reason), so I might get some
> idea of where to look. (The place where the exception happened didn't
> reveal too much.)
>
> It sounds like the RTL actually pre-checks for a case where pushing n
> bytes onto the stack would cause the stack to overflow the thread's
> stated stack size (in a multithreaded app), and signal SS$_STKOVF
> before it goes off the deep end. What does it do for a single threaded
> app?

To answer your question: In a multi-threaded/multi-stacked application,
each stack is a fixed size (no automatic expansion) and there are
'yellow zones' to help DECthreads know when you are near the edge of the
stack. The Calling Standard has lots of details on how stack checking
is implemented by the compilers.

In a traditional single-stack application, there is no yellow zone since
there is automatic stack expansion. The stack will expand and expand
until you run out of page file quota. You'll eventually end up with an
ACCVIO I believe.

Re: What causes a STKOVF

Which routine/module is the first (top-most) in the call chain ?
Always the same in all the dumps ?
What does DBG> EXA/INS tell you ?

Volker.

Re: What causes a STKOVF

Joe,

you'll get a STKOVF (instead of just an ACCVIO), if the process is
running a Thread Manager (like PTHREADs) or you're not running on the
process's initial kernel thread. Use SDA> SHOW PROC/IMA to see,
whether PTHREAD$RTL is in the image list. I'll bet it is for a
DECwindows image.

Check the stack pointer SP with DBG> EX SP

then examine the stack addresses and limits

DBG> SDA
SDA> EXA ctl$aq_stack;20
SDA> EXA ctl$aq_stacklim;20

SDA will show 1 quadword for each stack (offset 0=kernel, then exec,
super, user)

Try to figure out, if the current SP is near the limits (or outside)
the stack.

Volker.

Re: What causes a STKOVF

In article , John Reagan writes:
>
>
>Joe Sewell wrote:
>
>> What I need is an idea of when the RTL or lower layers detect a "stack
>> overflow" in a non-threaded situation (though it's a DECwindows app,
>> just in case it's multithreading for some reason), so I might get some
>> idea of where to look. (The place where the exception happened didn't
>> reveal too much.)
>>
>> It sounds like the RTL actually pre-checks for a case where pushing n
>> bytes onto the stack would cause the stack to overflow the thread's
>> stated stack size (in a multithreaded app), and signal SS$_STKOVF
>> before it goes off the deep end. What does it do for a single threaded
>> app?
>
>To answer your question: In a multi-threaded/multi-stacked application,
>each stack is a fixed size (no automatic expansion) and there are
>'yellow zones' to help DECthreads know when you are near the edge of the
>stack. The Calling Standard has lots of details on how stack checking
>is implemented by the compilers.
>
>In a traditional single-stack application, there is no yellow zone since
>there is automatic stack expansion. The stack will expand and expand
>until you run out of page file quota. You'll eventually end up with an
>ACCVIO I believe.

I thought he said it wasn't threaded in the initial post. I came across
many of these when working on a DECthreaded application. I setup a file
of configuration parameters and one was a stack size value to pass along
to pthread_attr_setstacksize().

Re: What causes a STKOVF

On Dec 10, 12:00 pm, Volker Halle wrote:
> Joe,
>
> you'll get a STKOVF (instead of just an ACCVIO), if the process is
> running a Thread Manager (like PTHREADs) or you're not running on the
> process's initial kernel thread. Use SDA> SHOW PROC/IMA to see,
> whether PTHREAD$RTL is in the image list. I'll bet it is for a
> DECwindows image.
>
> Check the stack pointer SP with DBG> EX SP
>
> then examine the stack addresses and limits
>
> DBG> SDA
> SDA> EXA ctl$aq_stack;20
> SDA> EXA ctl$aq_stacklim;20
>
> SDA will show 1 quadword for each stack (offset 0=kernel, then exec,
> super, user)
>
> Try to figure out, if the current SP is near the limits (or outside)
> the stack.
>
> Volker.

Thanks for the info; I'll do this.

You say that you wouldn't be surprised if a DECwindows image is
multithreaded. I cannot speak for what DECwindows itself is doing, but
*we* aren't multithreading it. On the other hand, I *do* see PTHREAD
$RTL high up in the call stack.

Assuming DECwindows is instigating multithreading (or perhaps the
X11R6 update -- that makes Xt "thread safe" -- does just enough to
kick this in), then much is explained.