Identify Main Function

Start with the short version of this proposal on the official issue list.

change name to "main_subprogram"

say it's the "source level subprogram where execution begins"

perhaps use the phrase "the compilation's runtime environment calls this entry point"

add examples for C, C++, Fortran, windows (winmain)

another example is a system that uses a preprocessor/wrapper to create lowered source from the user's application

state that it should never be mixed with 'artificial'

Most C compiler assume the main function to be "main". Other
languages might have different names, or even allow it to have
any name. If we have an entry to tell what is the main function
in a program, it will let the debugger to present this to users
in an intelligent way. Another debug-info format stabs could
support this with N_MAIN stab type.

Contents

proposal text

(( Fortran has a PROGRAM statement which is used to declare a
user-supplied name for the main function in a program. C and
C++ have no way to rename the main function. ))

If a function has been declared as the main function in a program,
it may be marked with a DW_AT_main_function attribute.

Background:

We should not extend DW_AT_entry_point because that applies to
Fortran functions with multiple entry points, and that's a
different concept altogether.

The most common reason to want the main function's name is to set
a breakpoint in it to begin debugging. But it's also very useful to know
what name was given to that function by the user, so that the name can
be used when printing messages and other output.

There are a variety of implementation specific ways that are mostly
reliable for using the ELF symbol table or other information to figure out
this information. But it's better to record the information in a portable
and reliable way in dwarf.

Sun Implementation of Fortran MAIN functions

In the phone discussion today, I explained the Sun implementation
of Fortran "program" statements. But I explained the implementation
incorrectly. The situation is more confused than I remembered. Here is an example:

The user creates a simple Fortran program:

program prog
integer array(1)
print *, array(1)
end

If the user compiles this source file into an object file, the following

1) The compiler creates a wrapper function called "main" which has
startup / teardown junk (fortran library calls). This function also
has a call to function called MAIN_

2) The compiler creates a function named MAIN_ to represent the code in the main function.
The compiler records the name "prog" as a sort of "alias" for the function called "MAIN_".

3) Using the stabs format, there was an N_MAIN stab which pointed at the name "MAIN_",
not the user-defined name. This was a little better than building the magic name
"MAIN_" into the debugger, but it's unrelated to getting the user-assigned name.

This implementation allows any consumer to find the source code for the
main program by searching for a function called "MAIN_" anywhere in the program.
The ELF symbol table can be used to do this search, and then the debug info for
single object file can be read (details will depend on the implementation).
But the consumer will not know the user-defined name of
the main function unless some additional debug info is read and understood.
Not knowing this name is not a fatal problem for usability. The source can
still be shown.

The Sun compilers have a general notion of "user name" versus "linker name"
for functions. This distinction is used heavily in C++ because of mangled names,
but it also crops up for C and Fortran in some cases. In this case it would make
sense for the Sun dwarf information to use "MAIN_" for the linker name of
this function, and "prog" for the user name of the function. If we did that,
then no specific extension would be needed. (Other than the existing heuristic of
looking for a function with the linker name of MAIN_/MAIN/main_/main.) But that's not
the way it's currently implemented.

Because of this untidiness in the Sun implementation, I can't really argue
that adding DW_AT_main will allow Sun to replace our existing extension with
a standard mechanism.

Intel Implementation of FORTRAN Main Functions

This is John Bishop writing.

Our debugger looks for known names generated by the compiler: in the Linux
case it's "MAIN__". We get the address of that symbol and then look for the
"closest preceeding" routine entry symbol with the same address. That's
the name of the FORTRAN main routine.

As I verfied at the end of the con-call on April 3, 2007, Intel Fortran won't
link a pair of FORTRAN .o files which both have main routines (i.e., a
routine specified with "PROGRAM" rather than "SUBROUTINE", etc.)

While this works, it's not beautiful; I therefore re-affirm my (mild) support
for a pair of attributes:

One for a routine which means "I believe I am a main routine"

One for a compilation unit which means "One of my routines believes it is a main routine"

The combination would let our debugger find the main routine on a traverse
of the compilation units, dipping only deeply into the one(s) which need to
be more closely looked at.

At the moment, this optimization of the initial scan wouldn't make a difference,
as our debugger already scans all the symbols in a cursory fashion so that it
can build a symbol table with the "important" symbols in it. I checked, and
we don't use the .debug_pubnames section.

Description of OpenVMS tools

This was copied by Chris Quenelle from some email by Jeff Nelson
It came from several distinct emails, and was glued to together by Chris.
There were other questions interspersed, but I tried to capture the details
of the OpenVMS implementation.

What do we mean by "main entry point"? On OpenVMS, we sometimes have
two:

a) The the first executable code address. This is where the operating
system transfers control after it has loaded the program into memory.

b) The logical entry. Some languages (C & C++ in particular) have a
RTL-supplied routine that initializes the language environment before
control is transferred to what the user thinks is "main". Sometimes the
initialization code invokes user-written code (e.g., constructors for
static objects).

The OpenVMS debugger gives the user the choice to begin debugging at (a)
or skipping to (b).

The compiler emits a "magic symbol" named TRANSFER$BREAK$GO. This symbol
must appear in the compilation unit of (a). The symbol value is (b), the
address of the logical entry point.

At startup, OpenVMS DEBUG determines the actual entry point, locates the
compilation unit containing the entry point address and reads the
symbols in the CU. If the debugger finds the magic symbol, the debugger
sets a breakpoint at (b) and tells the user "type GO to get to main
program". The magic symbol is kept hidden from the user. If the magic
symbol should appear in some other CU it is ignored (because the special
recognition code only appears in the debugger startup path).

The TRANFER$BREAK$GO symbol is language-independent. Any compiler can
emit the symbol and we will honor it. Most don't use it. Those that do
include PL/1, Ada, C and C++.

We make no assumption about the name of 'main' or its signature. OpenVMS
DEBUG supports the symbolic debugging of 12 languages (BASIC, FORTRAN,
Pascal, C, C++, COBOL, PL/1, BLISS, Ada, Assembler, DIBOL, RPG) on as
many as three architectures (VAX, Alpha, Integrity) so we try to be as
language-independent as possible. As far as we are concerned, any
routine can be 'main'.