These changes introduced some test failures on AIX and other platforms,
and rather than dig around for more failing platforms during the RCx
period, we will revert this to reapply later when it is more tested.

The output of gcc -print-search-dirs is subject to localisation, which means
that the literal text "libraries" will not be present if the user has a
non-English locale, and we won't determine the correct path for libraries
such as -lm, breaking the build. Problem diagnosed by Alexander Hartmaier.

I think the best we can do with respect to the f?pathconf tests is to
make sure that the perl call doesn't die, and that the system call
doesn't fail. And it's arguable we should only be testing the former.
But since we've been testing more that this anyway, it's probably safe
to test both.

With respect to the sysconf call, I think we shouldn't test more than
that perl doesn't die. Any further testing would require different
tests based the argument being passed in. Before doing that, it's
probably worth considering the purpose of the tests. I don't think we
really want to test that POSIX has been implemented correctly, only that
our layer over it is correctly implemented.

There appears to be a flaw in IO::Socket where some IO::Socket objects
are unable to properly report their socktype, sockdomain, or protocol
(they return undef, even when the underlying socket is sufficiently
initialized to have these properties).

PerlIO::scalar’s dup function (PerlIOScalar_dup) calls the base imple-
mentation (PerlIOBase_dup), which pushes the scalar layer on to the
new file handle.

When the scalar layer is pushed, if the mode is ">" then
PerlIOScalar_pushed sets the scalar to the empty string. If it is
already a string, it does this simply by setting SvCUR to 0, without
touching the string buffer.

The upshot of this is that newly-cloned in-memory handles turn into
the empty string, as in this example:

What was happening before commit b6597275 was that two bugs were can-
celling each other out: $str would be "" when the new thread started,
but with a string buffer containing "a" beyond the end of the string
and $fh remembering 1 as its position. The bug fixed by b6597275 was
that writing past the end of a string through a filehandle was leaving
junk (whatever was in memory already) in the intervening space between
the old end of string and the beginning of what was being written to
the string. This allowed "" to turn magically into "ab" when "b" was
written one character past the end of the string. Commit b6597275
started zeroing out the intervening space in that case, causing the
cloning bug to rear its head.

This commit solves the problem by hiding the scalar temporarily
in PerlIOScalar_dup so that PerlIOScalar_pushed won’t be able to
modify it.

Should PerlIOScalar_pushed stop clobbering the string and should
PerlIOScalar_open do it instead? Perhaps. But that would be a bigger
change, and we are supposed to be in code freeze right now.

The (ANSI) C compiler fails to compile precompiled (.i) files when both
-g and -O (all +O1 and above) are given. When -g is requested, -O, +O,
and +Onolimit are removed from optimize flags

This #fail does not occur with the newer aCC compiler B3910B, which is
also used on HP-UX on Itanium.

The check/modification has to be done as late as possible, as the other
options, like -Duse64bitall and -DDEBUGING, will modify the variables
that need to be checked after hints/hpux.sh has been dealt with.

All code points whose UTF-8 representations start with a byte containing
either \xFE or \xFF are considered problematic because they are not
portable. There are many such code points that are too large to
represent on a 32 or even a 64 bit platform. Commiteb83ed87110e41de6a4cd4463f75df60798a9243 failed to properly catch
overflow when the input flags to this function say to warn on, but
otherwise accept FE and FF sequences. Now overflow is checked for
unconditionally.

There are possible overlong sequences that this function blindly
accepts. Instead of developing the code to figure this out, turn this
function into a wrapper for utf8n_to_uvuni() which already has this
check.

The prior version had a number of issues, some of which have been taken
care of in previous commits.

The goal when presented with malformed input is to consume as few bytes
as possible, so as to position the input for the next try to the first
possible byte that could be the beginning of a character. We don't want
to consume too few bytes, so that the next call has us thinking that
what is the middle of a character is really the beginning; nor do we
want to consume too many, so as to skip valid input characters. (This
is forbidden by the Unicode standard because of security
considerations.) The previous code could do both of these under various
circumstances.

In some cases it took as a given that the first byte in a character is
correct, and skipped looking at the rest of the bytes in the sequence.
This is wrong when just that first byte is garbled. We have to look at
all bytes in the expected sequence to make sure it hasn't been
prematurely terminated from what we were led to expect by that first
byte.

Likewise when we get an overflow: we have to keep looking at each byte
in the sequence. It may be that the initial byte was garbled, so that
it appeared that there was going to be overflow, but in reality, the
input was supposed to be a shorter sequence that doesn't overflow. We
want to have an error on that shorter sequence, and advance the pointer
to just beyond it, which is the first position where a valid character
could start.

This fixes a long-standing TODO from an externally supplied utf8 decode
test suite.

And, the old algorithm for finding overflow failed to detect it on some
inputs. This was spotted by Hugo van der Sanden, who suggested the new
algorithm that this commit uses, and which should work in all instances.
For example, on a 32-bit machine, any string beginning with "\xFE" and
having the next byte be either "\x86" or \x87 overflows, but this was
missed by the old algorithm.

Another bug was that the code was careless about what happens when a
malformation occurs that the input flags allow. For example, a sequence
should not start with a continuation byte. If that malformation is
allowed, the code pretended it is a start byte and extracts the "length"
of the sequence from it. But pretending it is a start byte is not the
same thing as it actually being a start byte, and so there is no
extractable length in it, so the number that this code thought was
"length" was bogus.

Yet another bug fixed is that if only the warning subcategories of the
utf8 category were turned on, and not the entire utf8 category itself,
warnings were not raised that should have been.

And yet another change is that given malformed input with warnings
turned off, this function used to return whatever it had computed so
far, which is incomplete or erroneous garbage. This commit changes to
return the REPLACEMENT CHARACTER instead.

Thanks to Hugo van der Sanden for reviewing and finding problems with an
earlier version of these commits

There are two existing macros that do the job that this longish sequence
does. One, UTF8SKIP(), does an array lookup and is very likely to be in
the machine's cache as it is used ubiquitously when processing UTF-8.
The other is a simple test and shift. These simplify the code and
should speed things up as well.

A recent change exposed a faulty test, in t/uni/labels.t;
Previously, a downgraded label passed to eval under 'use utf8;'
would've been erroneously considered UTF-8 and the tests
would pass. Now it's correctly reported as illegal UTF-8
unless unicode_eval is in effect.