I don't know what the general concensus on this is at present, but doesn't it
begin to undermine Unicode as a credible standard?

As I understand it, this could cause any number of conversion issues,
particularly for clients with, say, client/server systems using both Win32 and
Java clients, each expecting a UTF-8 stream with their version of "correctness".

Will we be in a position where we'll need something like a special set of
Unicode control characters to determine whether it's one of a set of UTF-8
encodings or another?

As I understand it, Java's UTF-8 also differs from standard UTF-8 in that
surrogate-pairs are not encoded using 4 bytes, but rather that they are
encoded using 6 bytes (one group of 3 bytes for each of the pair), i.e.
Java UTF-8 treats each the two elements of surrogate pairs just as it
treats any other character whose code is greater than U+07ff.