Deficiencies of GCC's optimizer

This page lists places where GCC's code generation is suboptimal.
Although the examples are small, the problems are usually quite deep.

Note: unless otherwise specified, all examples have been compiled
with the current CVS tree as of the date of the example, on x86, with
-O2 -fomit-frame-pointer -fschedule-insns. (The x86 back
end disables -fschedule-insns, which is something that
should be revisited, because it almost always gives better code when I
turn it back on.)

If a function has been placed in a special
section via attributes, we may want to put its static data and string
constants in a special section too. But which one? (Being able to
specify a section for string constants would be useful for the Linux
kernel.)

Perhaps we should have an un-cse step right after cse, which tries to
replace a reg with its value if the value can be substituted for the
reg everywhere, if that looks like an improvement. Which is if the
reg is used only a few times. Use rtx_cost to determine if the
change is really an improvement.

The scheme is that each value has just one hash entry. The
first_same_value and next_same_value chains are no longer needed.

For arithmetic, each hash table elt has the following slots:

Operation. This is an rtx code.

Mode.

Operands 0, 1 and 2. These point to other hash table elements.

So, if we want to enter (plus:SI (reg:SI 30) (const_int
104)), we first enter (const_int 104) and find the
entry that (reg:SI 30) now points to. Then we put these
elts into operands 0 and 1 of a new elt. We put PLUS and
SI into the new elt.

Registers and mem refs would never be entered into the table as
such. However, the values they contain would be entered. There would
be a table indexed by regno which points at the hash entry for the
value in that reg.

The hash entry index now plays the role of a qty number. We still
need qty_table and reg_eqv_table to record which regs share a
particular qty.

When a reg is used whose contents are unknown, we need to create a
hash table entry whose contents say "unknown", as a place holder for
whatever the reg contains. If that reg is added to something, then
the hash entry for the sum will refer to the "unknown" entry. Use
UNKNOWN for the rtx code in this entry. This replaces make_new_qty.

For a constant, a unique hash entry would be made based on the
value of the constant.

What about MEM? Each time a memory address is referenced, we need
a qty (a hash table elt) to represent what is in it. (Just as for a
register.) If this isn't known, create one, just as for a reg whose
contents are unknown.

We need a way to find all mem refs that still contain a certain
value. Do this with a chain of hash elts (for memory addresses) that
point to locations that hold the value. The hash elt for the value
itself should point to the start of the chain. It would be good for
the hash elt for an address to point to the hash elt for the contents
of that address (but this ptr can be null if the contents have never
been entered).

With this data structure, nothing need ever be invalidated except
the lists of which regs or mems hold a particular value. It is easy
to see if there is a reg or mem that is equiv to a particular value.
If the value is constant, it is always explicitly constant.

Strength reduction and iteration variable elimination could be
smarter. They should know how to decide which iteration variables are
not worth making explicit because they can be computed as part of an
address calculation. Based on this information, they should decide
when it is desirable to eliminate one iteration variable and create
another in its place.

It should be possible to compute what the value of an iteration
variable will be at the end of the loop, and eliminate the variable
within the loop by computing that value at the loop end.

Many operations could be simplified based on knowledge of the
minimum and maximum possible values of a register at any particular
time. These limits could come from the data types in the tree, via
rtl generation, or they can be deduced from operations that are
performed. For example, the result of an and operation
one of whose operands is 7 must be in the range 0 to 7. Compare
instructions also tell something about the possible values of the
operand, in the code beyond the test.

Value constraints can be used to determine the results of a further
comparison. They can also indicate that certain and
operations are redundant. Constraints might permit a decrement and
branch instruction that checks zeroness to be used when the user has
specified to exit if negative.

Sometimes a variable is declared as int, it is
assigned only once from a value of type char, and then it
is used only by comparison against constants. On many machines,
better code would result if the variable had type char.
If the compiler could detect this case, it could change the
declaration of the variable and change all the places that use it.

The C expression *(X + 4 * (Y + C)) compiles better on
certain machines if rewritten as *(X + 4*C + 4*Y) because
of known addressing modes. It may be tricky to determine when, and
for which machines, to use each alternative.

Although GCC implements numerous optimizations of the standard C
library's string, math and I/O functions, there are still plenty more
transformations that could be implemented.

glibc has inline assembler versions of various string functions; GCC
has some, but not necessarily the same ones on the same architectures.
Additional optab entries, like the ones for ffs
and strlen, could be provided for several more functions
including memset, strchr, strcpy
and strrchr.

GCC could optimize strcmp (and memcmp)
where one string is constant to compare successive bytes to known
constant ones inline. For lengths longer than about 4, the inline
comparisons could test the prefix before calling the library function
(or inline a suitable instruction) for the remainder of the strings.

glibc optimizes strncpy from a string constant, where
the maximum length is constant and greater than the length of the
string constant including the terminating null character, by using
an optimization not implemented in GCC. Its makes use of its own
__mempcpy function to copy and return the address after
the data copied, to pass into memset. This could be
implemented if GCC provided a __builtin_mempcpy function.

Similarly, GCC could transform a call to strcat into
a call to strchr followed by a strcpy or
memcpy.

glibc optimizes strcspn, strcspn and
strpbrk where the string of characters that need to be
matched is constant and of length not greater than three.

The GNU libc also currently contains macros to optimize calls to
some string functions with constant arguments and those that can be
implemented by processor specific instructions. These transformations
are better performed in GCC, both to reduce the overhead of macro
expansion and to take advantage of the functions attributes, for
example to avoid a second call to a pure function altogether. The
use of these macros tend to cause huge blowup in the size of preprocessed
source if nested; for example, each nested call to strcpy
expands the source 20-fold, with four nested calls having an expansion
ten megabytes in size. GCC then consumes a huge amount of memory
compiling such expressions. The remaining optimizations need to be
implemented in GCC and then disabled in glibc, with benefits to other
systems as well, and the potential to use information GCC has about
alignment.

All the string functions act as if they access individual
characters, so care may need to be taken that no
-fstrict-aliasing problems occur when internal uses of
other types are generated. Also, the arguments to the string function
must be evaluated exactly once each (if they have any side effects),
even though the call to the string function might be optimized away.

Care must be taken that any optimizations in GCC are
standards-conforming in terms of not possibly accessing beyond the
arrays involved (possibly within a function call created in the
optimization); whereas the glibc macros know the glibc implementation
and how much memory it might access, GCC optimizations can't.

There are some further optimizations in glibc not covered here,
that either optimize calls to non-ISO C functions or call glibc
internal functions in the expansion of the macro.

Many of these optimizations should not be applied if
-Os is specified.

Loads from memory can take many cycles if the loaded data is not in
a cache line. Cache misses can bring a CPU to a halt for several 100
cycles, so minimizing cache misses is one of the most important jobs
of a modern compiler. Data prefetch instructions are one tool the
compiler can use for this.

Data prefetch instructions allow a compiler to minimize cache-miss
latency by loading data into a cache before it it accessed. A separate
page describes
data prefetch support and optimizations that
are in development or already supported by GCC.

Almost all code transformations implemented in GCC are target
independent by design, but how well they work depends on how accurately
an architecture is modelled. For example, combine may fail to combine
some instructions if a machine description does not provide a pattern,
or if the pattern exists but the predicates are too strong.
Another example is the target cost function which is used to determine
if a transformation is profitable. And sometimes a transformation that
some specific target could benefit from is simply missing in the target
independent implementation.

Optimization deficiencies like these are listed per target in the
target specific pages below. An issue listed for one target may still
also be an issue for other targets, but problems in the target specific
pages are problems with, and ideas for the machine descriptions of some
target.

Note: If an issue listed in a target specific issues
page, but it clearly is a target indepentent issue, please move it to
a page discussing target indepentent issues

For questions related to the use of GCC,
please consult these web pages and the
GCC manuals. If that fails,
the gcc-help@gcc.gnu.org
mailing list might help.
Comments on these web pages and the development of GCC are welcome on our
developer list at gcc@gcc.gnu.org.
All of our lists
have public archives.

Copyright (C)
Free Software Foundation, Inc.
Verbatim copying and distribution of this entire article is
permitted in any medium, provided this notice is preserved.