Some programs have more than one source files. These non-lib modules
should not be compiled with -DMODULE_NAME=libc. This patch puts these
non-lib modules in $(others-extras) and adds $(others-extras) to
all-nonlib.

1. call *%gs:SYSINFO_OFFSET. This requires TLS initialization.
2. call *_dl_sysinfo. This requires relocation of _dl_sysinfo.
3. int $0x80. This is slower than #2 and #3, but works everywhere.

When an object file is compiled with PIC, #1 is prefered since it is
faster than #3 and doesn't require relocation of _dl_sysinfo. For
dynamic executables, ld.so initializes TLS. However, for static
executables, before TLS is initialized by __libc_setup_tls, #3 should
be used for syscalls.

This patch adds <startup.h> which defines _startup_fatal and defaults
it to __libc_fatal. It replaces __libc_fatal with _startup_fatal in
static executables where it is called before __libc_setup_tls is called.
This header file is included in all files containing functions which are
called before __libc_setup_tls is called. On Linux/i386, when PIE is
enabled by default, _startup_fatal is turned into ABORT_INSTRUCTION and
I386_USE_SYSENTER is defined to 0 so that "int $0x80" is used for system
calls before __libc_setup_tls is called.

since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest. _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.

[BZ #21871]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.

since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest. _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.

[BZ #21871]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.

since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest. _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.

[BZ #21871]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.

tst-prelink.c checks for conflict with GLOB_DAT relocation against
stdout. On i386, there is no GLOB_DAT relocation against stdout with
PIE. Compile tst-prelink.c without PIE to generate GLOB_DAT relocation.

Acconding to POSIX glob with GLOB_NOCHECK should return a list consisting
of only of the input pattern in case of no match. However GLIBC does not
honor in case of '//<something'. This is due internally this is handled
and special case and prefix_array (responsable to prepend the directory
name) does not know if the input already contains a slash or not since
either '/<something>' or '//<something>' will be handle in same way.

This patch fix it by using a empty directory name for the latter (since
prefix_array already adds a slash as default for each entry).

Checked on x86_64-linux-gnu.

[BZ #10246]
* posix/glob.c (glob): Handle pattern that do not match and
start with '/' correctly.
* posix/globtest.sh: New tests for NOCHECK.

The LD_HWCAP_MASK environment variable may alter the selection of
function variants for some architectures. For AT_SECURE process it
means that if an outdated routine has a bug that would otherwise not
affect newer platforms by default, LD_HWCAP_MASK will allow that bug
to be exploited.

To be on the safe side, ignore and disable LD_HWCAP_MASK for setuid
binaries.

There are sometimes bugs in the compiler
(e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78460) that cause
it to consume all available memory. To limit the impact of this on
automated test robots, impose memory limits on all subprocesses in
build-many-glibcs.py. When the bug hits, the compiler will still run
out of memory and crash, but that should not affect any other
simultaneous task.

The limit can be configured with the --memory-limit command line switch.
The default is 1.5 gigabytes or physical RAM divided by the number of
jobs to run in parallel, whichever is larger. (Empirically, 1.5 gigs
per process is enough for everything but the files affected by GCCbug 78640, but 1 gig per process is insufficient for some of the math
tests and also for the "genautomata" step when building compilers for
powerpc64.)

Rather than continue to lengthen the argument list of the Context
constructor, it now takes the entire 'opts' object as its sole argument.

___tls_get_addr is the alternative x86-64 runtime interface to TLS with
aligned stack. If linker treats ___tls_get_addr just like __tls_get_addr
and the weakref assembly directive is extended to redirect __tls_get_addr
to ___tls_get_addr, we can replace reference to __tls_get_addr with
___tls_get_addr if GCC aligns stack for __tls_get_addr, but doesn't
support ___tls_get_addr.

This patch removes the cancellation mark from the auto-generation syscall
script. Now all the cancellable syscalls are done throught C code using
the SYSCALL_CANCEL macro. It simplifies the assembly required to each
architecture port, since the SYSCALL_CANCEL uses the already defined
INLINE_SYSCALL macros, and allows a more straigh fix on cancellation
machanism (since no more specific assembly fixes will be required).

This is an experimental patch which removes libio.h (and _G_config.h)
from the set of application-exposed headers. After this change, the
public stdio.h does not define any symbols whose names begin with _G_
nor _IO_, except that when optimizing, the guts of struct _IO_FILE and
three of the flag constants are visible (see bits/stdio.h and
bits/types/FILE_internals.h). There is a small amount of code
duplication in bits/stdio.h, of macro bodies from libio.h that are no
longer available. A number of internal .c files that were manually
doing PLT bypass for flockfile/funlockfile can now rely on
include/stdio.h to do it for them.

It passes the testsuite on x86_64-linux, but it needs a great deal of
additional testing; in particular I'm almost certain I broke the
support for old-format (GLIBC_2.0) struct _IO_FILE, which is
configured out on this target. Testing this properly would require
someone to get their hands on _really_ old binaries, compiled against
glibc 2.0, possibly statically-linked-but-using-NSS. Unfortunately,
libc.so cannot be expected to be binary identical.

However, this should be ready to feed into archive rebuilds to find
out what applications break.

Substantial clean-ups to the libio implementation are possible if this
sticks, but I haven't done 'em; this is intended to be minimal.

* libio/libio.h: Include features.h first thing, then error out if
either _LIBC or __USE_GNU is not defined, or if _ISOMAC is
defined. Inline all of _G_config.h except _G_HAVE_MREMAP here.
Get definitions of __mbstate_t, __fpos_t, __fpos64_t, struct
_IO_FILE, and the cookie-related types from the relevant
bits/types headers. Get definition of NULL from stddef.h.
Make all #ifdef _LIBC and #if __GNUC__ >= (2,3) blocks
unconditional. Remove all #if 0 and #ifdef __cplusplus blocks.
Change all uses of _G_va_list, _G_fpos_t, and _G_fpos64_t to
__gnuc_va_list, __fpos_t, __fpos64_t respectively. Provide
definitions of _STDIO_USES_IOSTREAM, __HAVE_COLUMN,
_IO_file_flags, __io_read_fn, __io_write_fn, __io_seek_fn,
__io_close_fn, _IO_cookie_io_functions_t for the sake of the
implementation. When _IO_USE_OLD_IO_FILE is defined, define
struct _IO_FILE_old.
* libio/libioP.h: When _IO_USE_OLD_IO_FILE is defined, define
struct _IO_FILE_old_plus. Only declare _IO_old_file_init_internal
when _IO_USE_OLD_IO_FILE is defined, and have it take an
argument of type struct _IO_FILE_old_plus.
* libio/oldfileops.c: Change all uses of _IO_FILE to _IO_FILE_old,
_IO_FILE_plus to _IO_FILE_old_plus, _IO_FILE_complete to _IO_FILE,
and _IO_FILE_complete_plus to _IO_FILE_plus. Then adjust types
to match caller/callee's expectations.
* libio/oldiofdopen.c, libio/oldiofopen.c, libio/oldiopopen.c
* libio/oldstdfiles.c: Likewise.
* sysdeps/generic/_G_config.h, sysdeps/unix/sysv/linux/_G_config.h:
Only provide definition or non-definition of _G_HAVE_MREMAP.

* include/stdio.h: Change all uses of _G_va_list to __gnuc_va_list,
_IO_ssize_t to __size_t, _IO_FILE to FILE, and _IO_fpos_t to __fpos_t.
When IS_IN (libc), redirect flockfile and funlockfile to
__flockfile and __funlockfile respectively.
When _IO_MTSAFE_IO and not _ISOMAC, include stdio-lock.h before
stdio.h proper.
* include/stdio_ext.h: Include bits/types/FILE_internals.h for the
sake of the inline definition of __fsetlocking.
* include/libio.h: Adjust #ifdef nest to activate multiple-include
optimization.
* include/bits/types/FILE_internals.h, include/bits/types/__fpos_t.h
* include/bits/types/cookie_io_functions_t.h: New trivial wrappers.
* include/bits/stdio.h: New wrapper; mark __uflow and __overflow
as hidden for intra-libc callers.

This patch adds the actual pretty-printer for errno. I could have
used Python's built-in errno module to get the symbolic names for the
constants, but it seemed better to do something entirely under our
control, so there's a .pysym file generated from errnos.texi, with a
hook that allows the Hurd to add additional constants. Then a .py
module is generated from that plus errno.h in the usual manner; many
thanks to the authors of the .pysym mechanism.

There is also a test which verifies that the .py file (not the .pysym
file) covers all of the constants defined in errno.h.

hurd-add-errno-constants.awk has been manually tested, but the
makefile logic that runs it has not been tested.

* stdlib/errno-printer.py: New pretty-printer.
* stdlib/test-errno-constants.py: New special test.
* stdlib/test-errno-printer.c, stdlib/test-errno-printer.py:
New pretty-printer test.
* stdlib/make-errno-constants.awk: New script to generate the
.pysym file needed by errno-printer.py.
* stdlib/Makefile: Install, run, and test all of the above, as
appropriate.

* sysdeps/mach/hurd/hurd-add-errno-constants.awk: New script to
add Mach/Hurd-specific errno constants to the .pysym file used by
stdlib/errno-printer.py.
* sysdeps/mach/hurd/Makefile: Hook hurd-add-errno-constants.awk
into the generation of that .pysym file.

Rename glibc.tune.ifunc to glibc.tune.hwcaps and move it to
sysdeps/x86/dl-tunables.list since it is x86 specicifc. Also
change type of data_cache_size, data_cache_size and
non_temporal_threshold to unsigned long int to match size_t.
Remove usage DEFAULT_STRLEN from cpu-tunables.c.

Since commit d957c4d3fa48d685ff2726c605c988127ef99395 (i386: Compile
rtld-*.os with -mno-sse -mno-mmx -mfpmath=387), vector intrinsics can
no longer be used in ld.so, even if the compiled code never makes it
into the final ld.so link. This commit adds the missing IS_IN (libc)
guard to the SSE 4.2 strcspn implementation, so that it can be used from
ld.so in the future.

All top-level files and directories are moved into a temporary storage
directory, REORG.TODO, except for files that will certainly still
exist in their current form at top level when we're done (COPYING,
COPYING.LIB, LICENSES, NEWS, README), all old ChangeLog files (which
are moved to the new directory OldChangeLogs, instead), and the
generated file INSTALL (which is just deleted; in the new order, there
will be no generated files checked into version control).

Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector
instructions. It is as fast as SSE2 version for small data sizes
and up to 1X faster for large data sizes on Haswell. Select AVX2
version on AVX2 machines where vzeroupper is preferred and AVX
unaligned load is fast.

Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector
instructions. It is as fast as SSE2 version for small data sizes
and up to 1X faster for large data sizes on Haswell. Select AVX2
version on AVX2 machines where vzeroupper is preferred and AVX
unaligned load is fast.

__x86_shared_non_temporal_threshold was set to 6 times of per-core
shared cache size, based on the large memcpy micro benchmark in glibc
on a 8-core processor. For a processor with more than 8 cores, the
threshold is too low. Set __x86_shared_non_temporal_threshold to the
3/4 of the total shared cache size so that it is unchanged on 8-core
processors. On processors with less than 8 cores, the threshold is
lower.

* sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
Set to the 3/4 of the total shared cache size.

This patch adds cache info to cpu_features to support tunables for both
cache info as well as CPU features in a single x86 namespace. Since
init_cacheinfo is in libc.so and cpu_features is in ld.so, cache info
and CPU features must be in a place for tunables.

This commit changes the way the list of SYS_* system call macros
is created on Linux. glibc now contains a list of all known system
calls, and the generated <bits/syscall.h> file defines the SYS_
macro only if the correspnding __NR_ macro is defined by the kernel
headers.

As a result, there glibc does not have to be rebuilt to pick up
system calls if the glibc sources already know about them. This
means that glibc can be built with older kernel headers, and if
the installed kernel headers are upgraded afterwards, additional
SYS_ macros become available as long as glibc has a record for
those system calls.

The test io/ftwtest-sh creates a directory that at some points during
the test does not have execute permission. To avoid leaving behind
such a directory that prevents the build directory from being removed
with a simple "rm -rf", it traps various signals to make the directory
executable and remove it before exit. However, this doesn't cover the
case where one of the tests simply fails (which happens with cross
testing if testing on a remote system where the path to the build
directory involves a symlink, or if that remote system fell over
during testing - I think the latter is the case where the directory is
left behind with bad permissions).

This patch makes that test also trap signal 0 (exit) so that the
directory gets properly removed in such failure cases as well.

Tested in both configurations where the test passes and where it fails
to verify that the result of the test is unchanged but the directory
is no longer left behind where it was previously left behind.

Simplify the Linux accept4 implementation based on the assumption
that it is available in some way. __ASSUME_ACCEPT4_SOCKETCALL was
previously unused, so remove it. Its functionality is implied
by the complex #if condition in accept4.c.

On x86, the usage of AT_HWCAP in glibc is obsolete since addition of
dl_x86_cpu_features. dl_hwcap, which was set from AT_HWCAP, is used by
dynamic linker to build an array of hardware capability names, which are
added to search path when loading shared object. dl_hwcap was unused on
x86-64 and only SSE2 was used on i386.

This patch sets dl_hwcap with new hardware capabilities from CPU
features. Currently, 2 capabilities, SSE2 and AVX2, are supported.
The maximum number of hardware capabilities is 64. Since x86-64
includes SSE2, SSE2 is skipped on x86-64. dl_x86_cap_flags is kepted
for i386 and is used by _dl_show_auxv. dl_x86_hwcap_flags is added
for new hardware capabilities.

In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers. It simplifies _dl_runtime_resolve and supports
different calling conventions. However, use xsave can be 10X slower
than saving and restoring vector and bound registers individually.

The ia64-specific clone2 call expects the base of the stack mapping and
the stack size as sep arguments, not an initial stack value as on other
stack-grows-down architectures. Reuse the stack-grows-up macro so we
pass in the right stack base.

The ia64-specific clone2 call expects the base of the stack mapping and
the stack size as sep arguments, not an initial stack value as on other
stack-grows-down architectures. Reuse the stack-grows-up macro so we
pass in the right stack base.

The problem is that this code relies on fused multiply-add, and relies
on the compiler contracting a * b + c to get a fused operation. But
sysdeps/ieee754/dbl-64/Makefile disables contraction for e_sqrt.c,
because the implementation in that directory relies on *not* having
contracted operations.

While it would be possible to arrange makefiles so that an earlier
sysdeps directory can disable the setting in
sysdeps/ieee754/dbl-64/Makefile, it seems a lot cleaner to make the
dependence on fused operations explicit in the .c file. GCC 4.6
introduced support for __builtin_fma on powerpc and other
architectures with such instructions, so we can rely on that; this
patch duly makes the code use __builtin_fma for all such fused
operations.

At present, libm tests for each function get built into a single
executable (for each floating point type, for each of normal / inline
/ finite-math-only functions, plus vector variants) and run together,
resulting in a single PASS or FAIL (for each of those nine variants
plus vector variants). Building this executable involves reading
about 40 MB of libm-test-*.c sources, which would grow to maybe 70 or
80 MB when the complex inverse trig and hyperbolic functions move to
the auto-libm-test-* mechanism (that move is practical now that tests
for all functions don't need regenerating for any change to
auto-libm-test-in but you can instead regenerate each
auto-libm-test-out-* file independently; auto-libm-test-out-casin and
auto-libm-test-out-casinh take about 38 minutes each to generate on my
system after such a move, auto-libm-test-out-cacos and
auto-libm-test-out-cacosh take about 80 minutes each).

This patch arranges for tests of each function to be run separately
from the makefiles instead. There are 121 functions being tested for
each (type, variant pair) (actually 126, but run as 121 from the
Makefile because each of the pairs (exp10, pow10), (isfinite, finite),
(lgamma, gamma), (remainder, drem), (scalbn, ldexp), shares a table of
test results and so is run together), so 1089 separate tests run from
the Makefile, plus 48 vector tests on x86_64 (six functions for eight
vector variants). Each test only involves a libm-test-<func>.c file
of no more than about 4 MB, rather than all such files taking about 40
MB. With tests run separately, test summaries will indicate which
functions actually have problems (of course, those problems may just
be out-of-date libm-test-ulps files if the file hasn't been updated
for the architecture in question recently).

All the .c files for the 1089+48 tests are generated automatically
from the Makefiles. Various checked-in boilerplate .c files are
removed as no longer needed. CFLAGS definitions for the different
kinds of tests are generated using makefile iterators to apply
target-specific variable settings. libm-have-vector-test.h is no
longer needed; the list of functions to test for each vector type is
now in the sysdeps Makefile.

This should remove the amount of boilerplate needed for float128
testing support; test-float128.h will still be needed, but not various
.c files or Makefile CFLAGS definitions. The logic for creating
dependencies on libm-test-support-*.o files should also render
<https://sourceware.org/ml/libc-alpha/2017-02/msg00279.html>
unnecessary.

Any comments? Especially regarding the use of iterators; there is
existing precedent (in elf/Makefile) for using o-iterator.mk as a
generic iterator with object-suffixes-left set to something other than
a list of object suffixes, but maybe there should be a differently
named iterator for such generic uses?

This patch consolidates the pthread_join and gnu extensions to avoid
simplify implementation and avoid code duplication. Both pthread_join
and pthread_tryjoin are now based on pthread_timedjoin_np.

It also fixes some inconsistencies on ESRCH, EINVAL, EDEADLK handling
(where each implementation differs from each other) and also on
clenup handler (which now always use a CAS). It also replace the
atomics operation with the C11 ones.

Linux commit b4b56f9ecab40f3b4ef53e130c9f6663be491894 introduced
a new HWCAP2 bit to indicate that the kernel now aborts a memory
transaction when a syscall is made. This patch adds that bit to
sysdeps/powerpc/bits/hwcap.h.

If C++ headers <cstdlib> or <cmath> are used, GCC 6 will include
/usr/include/stdlib.h or /usr/include/math.h from "#include_next"
(instead of stdlib/stdlib.h or math/math.h in the glibc source
directory), and this turns up as a make dependency. An implicit
rule will kick in and make will try to install stdlib/stdlib.h or
math/math.h as /usr/include/stdlib.h or /usr/include/math.h because
the target is out of date. We make a copy of <cstdlib> and <cmath>
in the glibc build directory so that stdlib/stdlib.h and math/math.h
will be used instead of /usr/include/stdlib.h and /usr/include/math.h.

tst-cleanupx4 is linked with tst-cleanupx4.o and tst-cleanup4aux.o.
Since tst-cleanupx4.o is compiled from tst-cleanup4.c with -fexceptions,
tst-cleanup4aux.c should also be compiled with -fexceptions.

If assembler doesn't support AVX512DQ, _dl_runtime_resolve_avx is used
to save the first 8 vector registers, which only saves the lower 256
bits of vector register, for lazy binding. When it is called on AVX512
platform, the upper 256 bits of ZMM registers are clobbered. Parameters
passed in ZMM registers will be wrong when the function is called the
first time. This patch requires binutils 2.24, whose assembler can store
and load ZMM registers, to build x86-64 glibc. Since mathvec library
needs assembler support for AVX512DQ, we disable mathvec if assembler
doesn't support AVX512DQ.

For Intel processors, when there are both L2 and L3 caches, SMT level
type should be ued to count number of available logical processors
sharing L2 cache. If there is only L2 cache, core level type should
be used to count number of available logical processors sharing L2
cache. Number of available logical processors sharing L2 cache should
be used for non-inclusive L2 and L3 caches.

For Intel processors, when there are both L2 and L3 caches, SMT level
type should be ued to count number of available logical processors
sharing L2 cache. If there is only L2 cache, core level type should
be used to count number of available logical processors sharing L2
cache. Number of available logical processors sharing L2 cache should
be used for non-inclusive L2 and L3 caches.

For Intel processors, when there are both L2 and L3 caches, SMT level
type should be ued to count number of available logical processors
sharing L2 cache. If there is only L2 cache, core level type should
be used to count number of available logical processors sharing L2
cache. Number of available logical processors sharing L2 cache should
be used for non-inclusive L2 and L3 caches.

This patch adds cache info to _dl_x86_cpu_features to allow a processor
to override cache info derived from CPUID.

Tested on x86 and x86-64.

* sysdeps/x86/cacheinfo.c: Skip if not in libc.
(init_cacheinfo): Use raw_data_size, raw_shared_size and
shared_non_temporal_threshold from _dl_x86_cpu_features if
not zero.
* sysdeps/x86/cpu-features.h (cache_info): New.
(cpu_features): Add cache.

(a) %rdx, the new requested size, is rounded up to a multiple of 16
(+734, %+738), and the result is stored in %rdx@738.

(b) %rdx@738 + 31 is rounded down to a multiple of 16, the result is
stored in rcx@750 (+742, +746, +750). So %rcx@750 == %rdx@738 + 16.

(c) %rcx@750 bytes are allocated on the stack (+754). %rsp is rounded
upwards to a multiple of 16, result is stored in %rcx@762 (+757, +762).
This does not change the value of %rsp because it already was a multiple
of 16.

(d) %rsi@766 == %rcx@762 + %rdx@738 is compared against %r15. But this
comparison is always false because we allocated 16 extra bytes on the
stack in (b), which were reserved for the alignment in (c), but in fact
unused. We are left with a gap in stack usage, and the comparison is
always false.

(@XXX refers to register values after executing the instruction at
offset +XXX.)

If the alignment gap was actually used because of different alignment
for %rsp, then the comparison failure would still occur because the gap
would not have been added after this reallocation, but before the
previous allocation.

As a result, extend_alloca is never able to merge allocations. It also
turns out that the interface is difficult to use, especially in
cojunction with alloca account (which is rarely optional).

Since commit 44d20bca52ace85850012b0ead37b360e3ecd96e (Implement
second fallback mode for DNS requests), there is a code path which
returns early, before *resplen2 is initialized. This happens if the
name server address is immediately recognized as invalid (because of
lack of protocol support, or if it is a broadcast address such
255.255.255.255, or another invalid address).

If this happens and *resplen2 was non-zero (which is the case if a
previous query resulted in a failure), __libc_res_nquery would reuse
an existing second answer buffer. This answer has been previously
identified as unusable (for example, it could be an NXDOMAIN
response). Due to the presence of a second answer, no name server
switching will occur. The result is a name resolution failure,
although a successful resolution would have been possible if name
servers have been switched and queries had proceeded along the search
path.

The above paragraph still simplifies the situation. Before glibc
2.23, if the second answer needed malloc, the stub resolver would
still attempt to reuse the second answer, but this is not possible
because __libc_res_nsearch has freed it, after the unsuccessful call
to __libc_res_nquerydomain, and set the buffer pointer to NULL. This
eventually leads to an assertion failure in __libc_res_nquery:

If assertions are disabled, the consequence is a NULL pointer
dereference on the next line.

Starting with glibc 2.23, as a result of commit
e9db92d3acfe1822d56d11abcea5bfc4c41cf6ca (CVE-2015-7547: getaddrinfo()
stack-based buffer overflow (Bug 18665)), the second answer is always
allocated with malloc. This means that the assertion failure happens
with small responses as well because there is no buffer to reuse, as
soon as there is a name resolution failure which triggers a search for
an answer along the search path.

This commit addresses the issue by ensuring that *resplen2 is
initialized before the send_dg function returns.

This commit also addresses a bug where an invalid second reply is
incorrectly returned as a valid to the caller.