> Well double rubbish...yeah, you are right, but I'm thinking of pipes in >a different sense -- not unix pipes as they are now, but as they were >originally >created in unix

Unix pipes have always been binary-safe.

> Also -- off hand, I don't >know of any linux pipes that handle a 9 bit data type...

Yeah, Linux doesn't run on any 9-bit-byte platforms. That's fairly irrelevant; such platforms are now just an historical curiosity.

> This is the whole point -- expecting your STDIO/STDOUT to be binary >safe is not logical -- it's not done.

The file descriptors stdin and stdout are binary safe; if they refer to pipes then the pipes themselves are binary safe; but what's ultimately on the other side may well not be. However, if your program's purpose is to process binary data on stdin and stdout then it *is* logical, and normal, to assume that whatever's on the other side of stdin and stdout is binary-safe. It's the user's fault if he runs a binary-emitting program with stdout pointing at a terminal.

> Perl wasn't written as a binary processor.

It *was* written to be capable of handling binary data, among other things. It adopted the Unix model, where file handles (and in particular stdin and stdout) can carry textual or binary data as the program wishes.

> We are talking usage of perl to process material on STDIN/STDOUT >at a terminal in an environment with a standard locale set.

There's that "at a terminal" again. It's possible to test whether stdin or stdout point at an actual terminal, with Perl's -t operator, but you're trying to include as "at a terminal" situations where stdin/stdout refer to pipes to text-processing programs. Those don't satisfy -t, and can't be distinguished from pipes to binary-processing programs. It's impossible for Perl to distinguish between the text-processing environment that you imagine and the rather common situation of some binary data being involved.

Alexander Hartmaier wrote: > use open ':std' => ':locale'; > Interesting! Looks like a good default for cli scripts/apps (that > never take binary data as input). > > Doesn't that solve your problem, Linda? It might... but that perl isn't bright enough to follow standards and use it automatically on STDIO to a char device is not a user-friendly/program friendly default, IMO.

But I realize that user-friendly is one of the last things in mind when it comes to perl.

On 26 April 2012 19:32, Johan Vromans <jvromans [at] squirrel> wrote: > Jesse Luehrs <doy [at] tozt> writes: > >> Why are you assuming that text is the only thing that people ever pipe >> to a program? Interpreting STDIN as UTF-8 would break something along >> the lines of a perl implementation of gzip, for instance. > > I expect a Perl program that processes binary information from STDIN to > use an explicit binmode.

On 26 April 2012 20:15, Tom Christiansen <tchrist [at] perl> wrote: > Jesse Luehrs <doy [at] tozt> wrote > on Thu, 26 Apr 2012 09:32:59 CDT: > >> Why are you assuming that text is the only thing that people ever pipe >> to a program? Interpreting STDIN as UTF-8 would break something along >> the lines of a perl implementation of gzip, for instance. This may not >> be a bad thing for a default assuming it can be overridden, but it would >> certainly not be backwards compatible. > > Playing the devil's advocate for a moment, any program that assumes an > unmarked stream to be in binary not text is inherently broken on all > Microsoft-encumbered platforms, as those assume the contrary condition.

>> Playing the devil's advocate for a moment, any program that assumes an >> unmarked stream to be in binary not text is inherently broken on all >> Microsoft-encumbered platforms, as those assume the contrary condition. >And what gives you that idea?

I assumed that STD{IN,OUT,ERR} were O_TEXT on Microsoft, not O_BINARY.

On 3 May 2012 17:42, Tom Christiansen <tchrist [at] perl> wrote: >>> Playing the devil's advocate for a moment, any program that assumes an >>> unmarked stream to be in binary not text is inherently broken on all >>> Microsoft-encumbered platforms, as those assume the contrary condition. > >>And what gives you that idea? > > I assumed that STD{IN,OUT,ERR} were O_TEXT on Microsoft, not O_BINARY. > > Is this not so?

Not that I ever noticed. Maybe someone abstracted that away...

The only related thing that comes to mind is that Windows traditionally puts a BOM in Unicode files. Which then causes problem when you assume that *NIX style piping is safe. IOW,

cat x y z > xyz

will end up with three BOM's in it, when the author probably didnt even know there were BOM's there in the first place.

On 3 May 2012 17:48, demerphq <demerphq [at] gmail> wrote: > On 3 May 2012 17:42, Tom Christiansen <tchrist [at] perl> wrote: >>>> Playing the devil's advocate for a moment, any program that assumes an >>>> unmarked stream to be in binary not text is inherently broken on all >>>> Microsoft-encumbered platforms, as those assume the contrary condition. >> >>>And what gives you that idea? >> >> I assumed that STD{IN,OUT,ERR} were O_TEXT on Microsoft, not O_BINARY. >> >> Is this not so? > > Not that I ever noticed. Maybe someone abstracted that away... > > The only related thing that comes to mind is that Windows > traditionally puts a BOM in Unicode files. Which then causes problem > when you assume that *NIX style piping is safe. IOW, > > cat x y z > xyz > > will end up with three BOM's in it, when the author probably didnt > even know there were BOM's there in the first place.

After I wrote this I went away and had a coffee and it all came rushing back in a horrible flashback (i am a recovering windows user), and indeed you are correct. Sorry. I had somehow managed to blot it all out. :-)

On Thursday May 3 2012 5:38:40 PM demerphq wrote: > On 26 April 2012 19:32, Johan Vromans <jvromans [at] squirrel> wrote: > > Jesse Luehrs <doy [at] tozt> writes: > >> Why are you assuming that text is the only thing that people ever pipe > >> to a program? Interpreting STDIN as UTF-8 would break something along > >> the lines of a perl implementation of gzip, for instance. > > > > I expect a Perl program that processes binary information from STDIN to > > use an explicit binmode. > > Based on what documentation?

perldoc -f binmode

On some systems (in general, DOS and Windows-based systems) binmode() is necessary when you're not working with a text file. For the sake of portability it is a good idea to always use it when appropriate, and to never use it when it isn't appropriate. Also, people can set their I/O to be by default UTF-8 encoded Unicode, not bytes.

On 3 May 2012 18:40, Darin McBride <dmcbride [at] cpan> wrote: > On Thursday May 3 2012 5:38:40 PM demerphq wrote: >> On 26 April 2012 19:32, Johan Vromans <jvromans [at] squirrel> wrote: >> > Jesse Luehrs <doy [at] tozt> writes: >> >> Why are you assuming that text is the only thing that people ever pipe >> >> to a program? Interpreting STDIN as UTF-8 would break something along >> >> the lines of a perl implementation of gzip, for instance. >> > >> > I expect a Perl program that processes binary information from STDIN to >> > use an explicit binmode. >> >> Based on what documentation? > > perldoc -f binmode > > On some systems (in general, DOS and Windows-based systems) > binmode() is necessary when you're not working with a text > file. For the sake of portability it is a good idea to always > use it when appropriate, and to never use it when it isn't > appropriate. Also, people can set their I/O to be by default > UTF-8 encoded Unicode, not bytes. > > binmode - necessary when working with non-text.

mea-culpa. i knew all this stuff, and had managed to forget it when i migrated to *nix. :-)

[Quoting demerphq, on May 3 2012, 17:38, in "Re: unicode question"] > > I expect a Perl program that processes binary information from > > STDIN to use an explicit binmode. > > Based on what documentation?

E.g, from Camel IV, p. 906:

If youâ€™re running on a system that distinguishes between text and binary files, you may need to put your filehandle into binary modeâ€”or forgo doing so, as the case may beâ€”to avoid mutilating your files. On such systems, if you use text mode on a binary file, or binary mode on a text file, you probably wonâ€™t like the results.

demerphq wrote: > On 26 April 2012 19:32, Johan Vromans <jvromans [at] squirrel> wrote: > >> Jesse Luehrs <doy [at] tozt> writes: >> >> >>> Why are you assuming that text is the only thing that people ever pipe >>> to a program? Interpreting STDIN as UTF-8 would break something along >>> the lines of a perl implementation of gzip, for instance. >>> >> I expect a Perl program that processes binary information from STDIN to >> use an explicit binmode. >> > > Based on what documentation? > --- Based on the standard usage of STDIO as coming from a user terminal.

It can come from a file. But STDIO was often presumed to come from a user's terminal or text that they had typed in.

More reliably, modern programs at least look to see if STDIO is connected to a char-device, and use a switch to override defaults (i.e. ls --color=always when you want to get color through 'less'), as 'ls' defaults to color off when it sees a pipe.

But in perl -- if it is asked to parse 'newlines' as in while (<>) {...}

Then I submit that expecting that stream to be in binary is lunacy.

It should be treated as text as it is being processed as textual lines.

On Fri, May 4, 2012 at 12:36 AM, Linda W <perl-diddler [at] tlinx> wrote: > Â Â Â Based on the standard usage of STDIO as coming from a user terminal. > > Â Â Â It can come from a file.Â But STDIO was often presumed to come from > a user's terminal or text that they had typed in. > > Â Â Â More reliably, modern programs at least look to see if STDIO is > connected > to a char-device, and use a switch to override defaults (i.e. ls > --color=always > when you want to get color through 'less'), as 'ls' defaults to color off > when > it sees a pipe. > > Â Â Â But in perl -- if it is asked to parse 'newlines' as in > while (<>) {...} > > Then I submit that expecting that stream to be in binary is lunacy. > > It should be treated as text as it is being processed as textual lines.

No matter what defaults we choose, it will be the wrong one a significant amount of the time. I don't think we can solve this using different defaults. I certainly don't think guessing makes the odds better. In the end, you should always state what you want: text (and which encoding) or binary. Don't ask a computer to mindread, that's asking for trouble.