I can't reproduce this at the moment, but I've seen it a number of
times.
What happens is, I'll end up catting a binary file into a *shell* buffer
by mistake (e.g., by invoking wget with -O- and forgetting to pipe it
somewhere) and when I kill that process, the characters printed in the
shell buffer have gone into Crazy Mode.
Like, I'll type "pwd", and instead of it looking like
<jwz＠grendel:/tmp/> pwd
/tmp
<jwz＠grendel:/tmp/>
It will look like
＠#!#$^%^%%(＠!!$$%%＠ pwd
,<$^
＠#!#$^%^%%(＠!!$$%%＠
(but not those exact characters.) In other words, the shell is running
programs properly, but everything it prints out is getting garbled.
Running /usr/bin/reset does not fix matters; all I can do is kill that
shell buffer.
I'm guessing that what is going on is that some particular sequence of
bytes is putting the shell into a Be-Crazy-Unicode mode or something,
when clearly, no shell output should be interpreted with such semantics.
I've definitely seen this with 21.1.14 several times. I have not yet
seen it with 21.4.8, but I've only been using that one for a few days.
--
Jamie Zawinski
jwz(a)jwz.org http://www.jwz.org/
jwz(a)dnalounge.com http://www.dnalounge.com/

Jamie> I'm guessing that what is going on is that some particular
Jamie> sequence of bytes is putting the shell into a
Jamie> Be-Crazy-Unicode mode or something,
Do you (or your vendor) build with Mule? [A report with M-x
report-emacs-bug (with a reasonably fresh net-utils package) tends to
help here, bandwidth be damned.]
I've seen similar lossage with *terms (I forget which flavor), but not
in XEmacs yet.
It would be helpful to see exactly what output is being spewed,
especially if you can provide a translation.
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
My nostalgia for Icon makes me forget about any of the bad things. I don't
have much nostalgia for Perl, so its faults I remember. Scott Gilbert c.l.py

I'd like to note that if Jamie doesn't have xemacs sources on
at least one of his machines, I don't want to know. : )
Azun-"A girl needs to keep her illusions"-dris
PS My skin is still k00ler than yours : )
--
http://www.azundris.com/

Jamie> This is clearly a bug: no matter what binary characters got
Jamie> dumped into my shell buffer, they should damned well not be
Jamie> being displayed as Hebrew, even if I *did* have those fonts
Jamie> installed.
There's no such thing as a binary character. All octets are binary;
the character interpretation is up to the user. Eg, suppose you were
catting a text file in Tel Aviv. Think about it; this bug can't be
fixed for all the people all the time.
You might try yelling at your vendor to provide non-mule packages.
Debian does. I think Mandrake and SuSE do, too.
As a workaround, try hanging
(lambda ()
(set-process-coding-system (get-buffer-process (current-buffer))
'binary 'binary))
on shell-mode-hook or comint-mode-hook. I'll figure out how to make
sure there's a sane default for 21.4.11 (alas, I just shipped
21.4.10).
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.

Your pedantry is charming, but I just want my computer to work like it
did in 1985. I don't speak Hebrew, and I don't plan on ever learning to
speak Hebrew, and so I would like it to get the hell out of my face.

Yeah right, that'll work.
When are you guys going to have a non-alpha Cocoa/Aqua port of XEmacs so
I can finally stop using Unix and X?
--
Jamie Zawinski
jwz(a)jwz.org http://www.jwz.org/
jwz(a)dnalounge.com http://www.dnalounge.com/

>>>>> "Jamie" == Jamie Zawinski
<jwz(a)jwz.org&gt; writes:
Jamie> This is clearly a bug: no matter what binary characters got
Jamie> dumped into my shell buffer, they should damned well not be
Jamie> being displayed as Hebrew, even if I *did* have those fonts
Jamie> installed.

There's no such thing as a binary character. All octets are
binary;
the character interpretation is up to the user.

Except, last time I checked, XEmacs/Mule completely ignores the user's
hints as to how he'd prefer to handle the "binary octets". I bet
Jamie isn't running in a Hebrew locale. Or in an "ISO 2022 locale"
(if there's such a thing), or anything like it.
There would be much less complaints about Mule if it accepted the
user's locale settings out of the box. Despite the general brokenness
of the entire locale model, it is a better default to accept it than
to completely ignore it.

Hrvoje> Except, last time I checked, XEmacs/Mule completely
Hrvoje> ignores the user's hints as to how he'd prefer to handle
Hrvoje> the "binary octets".
It still does. That's one of several reasons why --with-mule _still
defaults to_ *no*. Jamie chose to go with a vendor which doesn't offer
a non-mule build (maybe you have some pull with them ;-). He chooses
not to build XEmacs himself.
Jamie, I sympathize with you, but I can't help you very much under
those circumstances. I have neither the time nor the knowledge to
rewrite Mule for 21.4.11, and even if I did, I'd have to veto the
patch---too destabilizing for the people who do need Mule. You don't,
in fact you'd clearly be happier with the no-mule build.
As for the Cocoa/Aqua port, there are rumors of interest. I have no
idea how serious those developers are, I encouraged them to get in
touch, but haven't heard back. Oh well. (No, I'm not giving up, but
that's not a good sign.)
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.

>>>>> "Hrvoje" == Hrvoje Niksic
<hniksic(a)xemacs.org&gt; writes:
Hrvoje> Except, last time I checked, XEmacs/Mule completely
Hrvoje> ignores the user's hints as to how he'd prefer to handle
Hrvoje> the "binary octets".
It still does. That's one of several reasons why --with-mule _still
defaults to_ *no*.

OK. But it shouldn't be that hard to change this? Or am I missing
something?
The "locale support" might do several things:
a) Check the appropriate variables and set the language environment
accordingly.
b) Set things up so that the LC_* settings really are respected in
most reasonable circumstances. For example, in a UTF-8
environment, Mule would treat unknown input as UTF-8. This
includes file input, shell buffers, etc. Needless to say, ISO 2022
autodetection should be turned off by default.
c) Make especially sure that in single-byte language environments
(e.g. the "C" locale and iso-8859-* locales, but not e.g. UTF-8)
the conversions from external to internal format and vice versa are
reversible. That is, if you run M-! cat binary-file, you should be
able to save the resulting output buffer into another file without
loss of data. The same should work in a shell buffer, modulo
perhaps the newline detection.
d) Get rid of ISO 2022... use UTF-8 as the internal representation...
Oops.

Jamie chose to go with a vendor which doesn't offer a non-mule
build
(maybe you have some pull with them ;-). He chooses not to build
XEmacs himself.

I understand that -- I merely wanted to correct the "this can't work
for everyone" part and its Tel Aviv corollary. It can't work
perfectly for everyone, but it could work much more reasonably by
default for most people.

Jamie, I sympathize with you, but I can't help you very much
under
those circumstances. I have neither the time nor the knowledge to
rewrite Mule for 21.4.11

I don't think what I proposed above really requires rewriting Mule,
except for the part about getting rid of ISO 2022. But remember that
since Jamie is probably running in a Latin 1 locale, and the
`iso-8859-1' encoding is already free of the ISO 2022 lossage, he'd be
ok.

You're saying, "the reason you are suffering is that the binary you are
running was built with --be-stupid=yes. Go recompile."
That's a rather un-emacs-like attitude. A better solution would be for
you to provide a global runtime (setq be-stupid nil) for turning off
stupidity even when stupidity-support is a compile-time option.
--
Jamie Zawinski
jwz(a)jwz.org http://www.jwz.org/
jwz(a)dnalounge.com http://www.dnalounge.com/

Jamie> That's a rather un-emacs-like attitude. A better solution
Jamie> would be for you to provide a global runtime (setq
Jamie> be-stupid nil) for turning off stupidity even when
Jamie> stupidity-support is a compile-time option.
Yup. GNU Emacs 20 did this. They were still getting core bug
reversions (the "\201 WTF?" bug) in 20.6. Today, they still see
occasional bug reports related to errors in `be-stupid' handling in
21.x.
Ben has some support for it in 21.5, but that is at least 6 months
from gamma release. I think he plans to have "pure binary" vs "Mule"
switchable at release.
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.

Hrvoje> OK. But it shouldn't be that hard to change this? Or am
Hrvoje> I missing something?
Probably.:-)
Hrvoje> The "locale support" might do several things:
Hrvoje> a) Check the appropriate variables and set the language
Hrvoje> environment accordingly.
It does. I guess the next thing to try is
(set-coding-priority-list '(no-conversion))
by default. This means that Mule elisp files will not be displayed
correctly by default in the C locale. However, AFAIK all the Mule
elisp is maintained by Japanese, me, and Ben, so they'll have a
Japanese language environment by default or know what to do.
OK, done.
Hrvoje> b) Set things up so that the LC_* settings really are
Hrvoje> respected in most reasonable circumstances.
We already go out of our way to do exactly the opposite. The
consistent reports of Dired bugs in European locales are due to the
fact that we don't break the legs of some of the LC_* settings.
The POSIX locale is just plain broken inside of Mule. Emacs does too
much parsing of non-interactive subprocess output, assuming it's going
to get C locale output. Ditto for lots of internal functions and
system call wrappers. OTOH, in interactive process buffers, we want
what the user expects. So the bottom line is that this is not going
to get fixed for a while, because we're basically limited to
heuristics about "what the user expects." I doubt the non/interactive
distinction will be sufficient.
Hrvoje> For example, in a UTF-8 environment, Mule would treat
Hrvoje> unknown input as UTF-8.
Bu-wha-ha-ha! That means depending on Mule-UCS, which _we don't know
how to maintain_ and Himi doesn't seem to be maintaining. Postpone
that specific case to 22.0.
Hrvoje> Needless to say, ISO 2022 autodetection should be turned
Hrvoje> off by default.
If that's needless to say, then you're definitely missing something.
First of all, it is ISO 2022 autodetection that detects all ISO 8859
coding systems. I know that what you want is to turn off escape
sequences, but unfortunately that's not the way it works. The closest
thing is the set-coding-priority-list hack. But that means that
ISO-8859-X is also shut off, because the only no-conversion coding
system is iso-8859-1-unix and its aliases.
Second, if (in an 8-bit locale) you don't need the escape sequences,
then you don't need Mule.
Hrvoje> c) Make especially sure that in single-byte language
Hrvoje> environments (e.g. the "C" locale and iso-8859-* locales,
Hrvoje> but not e.g. UTF-8) the conversions from external to
Hrvoje> internal format and vice versa are reversible.
This is already the case for binary == ISO-8859-1. I'm not sure
whether it can be done trivially for ISO-8859-X, X != 1. (It _is_
also true for "true" ISO-8859-X files; as you know, the problem is the
escape processing.) I've been postponing doing that in 21.5 until I
can put together a regression test for it. Now that I understand
test-harness.el, this is high on my list.
Hrvoje> d) Get rid of ISO 2022... use UTF-8 as the internal
Hrvoje> representation... Oops.
I hope to have that in 22.0 (experimental). It's not that hard, and
Ben says we'll have Unicode font support, so even a naive
implementation can be efficient for most purposes.
However, this is completely irrelevant to your concerns. The internal
representation really is internal (modulo autosave files, but those
aren't in internal representation either).
Hrvoje> It can't work perfectly for everyone, but it could work
Hrvoje> much more reasonably by default for most people.
Sure, but for these purposes I'm Japanese, remember. In my own apps,
I'm happy with the defaults. God be thanked for you and Jamie, I have
well-described bugs to work with. But the only Mule programmers I
have available are the Japanese, and me. Getting the defaults right
in a way that doesn't just cause more breakage isn't easy.
Hrvoje> I don't think what I proposed above really requires
Hrvoje> rewriting Mule, except for the part about getting rid of
Hrvoje> ISO 2022.
I was exaggerating. But remember who is going to be doing that work,
at least in 21.4. It's not Ben. A lot of this is pretty major at my
current level of skills.
Hrvoje> But remember that since Jamie is probably running in a
Hrvoje> Latin 1 locale, and the `iso-8859-1' encoding is already
Hrvoje> free of the ISO 2022 lossage, he'd be ok.
But he's not, and nothing you've suggested will change things for
Latin-1 locales.
Never-wrote-any-Code-That-Doesn't-Suck-ly y'rs
Steve
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.

sjt> I guess the next thing to try is
sjt> (set-coding-priority-list '(no-conversion))
sjt> by default.
Damn. This blows chunks if you're compiling a Mule Lisp library which
depends on character read syntax. That's not acceptable. This is
going to be harder than I'd hoped.
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.

by default. This means that Mule elisp files will not be displayed
correctly by default in the C locale. However, AFAIK all the Mule
elisp is maintained by Japanese, me, and Ben, so they'll have a
Japanese language environment by default or know what to do.

Or, emacs-lisp mode could simply recognize the Mule elisp files and do
the right thing. Catting them in a shell buffer wouldn't display
Japanese contents, sure, but that's a feature, not a bug.

Hrvoje> For example, in a UTF-8 environment, Mule would
treat
Hrvoje> unknown input as UTF-8.
Bu-wha-ha-ha! That means depending on Mule-UCS, which _we don't know
how to maintain_ and Himi doesn't seem to be maintaining.

That was only an example, although a pretty important one, given that
Red Hat 8 defaults to UTF-8 European locales.
Doesn't Ben have a workspace with working Unicode? Or is it too
alpha?

Hrvoje> Needless to say, ISO 2022 autodetection should be
turned
Hrvoje> off by default.
If that's needless to say, then you're definitely missing something.
First of all, it is ISO 2022 autodetection that detects all ISO 8859
coding systems.

I'm not sure what you mean by this. Why do "all ISO 8859 coding
systems" need special autodetection? I'm arguing for all such
autodetection to be turned off. Getting rid of ISO 2022 is orthogonal
to that effort, but is not contrary to it.

Second, if (in an 8-bit locale) you don't need the escape
sequences,
then you don't need Mule.

That's not quite true. I don't need escape sequences *by default*.
But for example, I want Latin 1 mail and news messages to be rendered
as Latin 1, and ditto for Japenese, etc. Mail (MIME) is only one
example of a format that carries charset information with the message
stream; there are others.

Hrvoje> c) Make especially sure that in single-byte language
Hrvoje> environments (e.g. the "C" locale and iso-8859-* locales,
Hrvoje> but not e.g. UTF-8) the conversions from external to
Hrvoje> internal format and vice versa are reversible.
This is already the case for binary == ISO-8859-1.

Hrvoje> It can't work perfectly for everyone, but it could
work
Hrvoje> much more reasonably by default for most people.
Sure, but for these purposes I'm Japanese, remember.

True. However, we needn't stick to the POSIX locale like crazy,
either. In Japanese locale, we could enable auto-detection. I have
no problem with "doing whatever the users expect, unless they expect
consistency." Most users don't change their locales every day, after
all.

God be thanked for you and Jamie, I have well-described bugs to work
with.

Yeah. If you need a *very* specific one for the test harness, here's
one from me:
(Assert (> (length
(decode-coding-string
(encode-coding-string (string (make-char 'japanese-jisx0208 56 108))
'iso-2022-jp)
'iso-8859-2))
1))
In other words, encoding an ISO 2022 string as "iso-8859-2" should not
produce a Japanese char.

Hrvoje> Or, emacs-lisp mode could simply recognize the Mule elisp
Hrvoje> files and do the right thing.
Er, it does that by recognizing ISO 2022 escape sequences, etc....
Hrvoje> Catting them in a shell buffer wouldn't display Japanese
Hrvoje> contents, sure, but that's a feature, not a bug.
Huh? If you're catting them on purpose, that's a bug. Get real. For
example, consider about.el. You want both Skyttä and
Nikšić to
display correctly in both Latin-1 and Latin-2 locales, don't you?
Even though you know that most of the rest of us can't pronounce
either name correctly, you don't want them displaying as octal escapes
(or worse, characters from the wrong set).
This is _not_ just a problem of files that are gobbledygook to most
current. It is a problem of files that are quite readable, too.
Hrvoje> Doesn't Ben have a workspace with working Unicode? Or is
Hrvoje> it too alpha?
Much too alpha. Minimum six months to gamma, and I doubt it will be
working any better than current Mule from your point of view until we
beat on it for a year. Ben is smart, I could be wrong. But (until
the whole world speaks Unicode and nothing but Unicode) this is not a
problem that yields to Ben-smart. Ben's workspace is known to have
many "_nobody_ wants it this way" default configuration bugs. Ie,
it's a problem that has be dealt with via Handa-persistent. (Granted
I wish Ben or Erik had designed Mule! even now he's constrained. :( )

> But that means that ISO-8859-X is also shut off, because the
> only no-conversion coding system is iso-8859-1-unix and its
> aliases.

Hrvoje> In an iso-8859-2 locale, input bytes between 160 and 255
Hrvoje> should be considered Latin 2.
Mule doesn't work that way. Not even in Ben's workspace. Sorry. You
can have binary.
Sure, it could work that way, and I think Ben's 8-bit work will
support "proclaiming" the 8-bit stream to be ISO-8859-2 (or KOI8-U,
for that matter) by release time. But we are definitely talking very
alpha here.
Hrvoje> True. However, we needn't stick to the POSIX locale like
Hrvoje> crazy, either. In Japanese locale, we could enable
Hrvoje> auto-detection.
I don't work in a Japanese locale; for my purposes POSIX is too broken
to work in any locale but "C". Mule works fine in that context, as
designed.
Again, if we are talking about "most users", you and Jamie are in the
minority. Most people use Emacs as a human-readable-text editor, and
want it to go to some effort to make texts handed to it readable, to
the author if not to the reader. They don't expect their text editor
to be reliable in the face of binary data without special effort.

as possible, we should be working to make it more reliable.
We still need a be-stupid switch, as Jamie puts it. But it should be
t, not nil, by default.
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.

Hrvoje> Catting them in a shell buffer wouldn't display
Japanese
Hrvoje> contents, sure, but that's a feature, not a bug.
Huh? If you're catting them on purpose, that's a bug. Get real.

But, you see, I *really* don't want auto-detection of Japanese, and
neither does anyone I know. *Auto*-detection is the keyword here. If
the context calls for charset detection (as in a MIME message), or if
I instruct the computer to watch out for Japanese, then that's fine.
If, on the other hand, I cat random binary in a shell buffer, I want
to get that binary content in my shell buffer. This bug started by
Jamie's Mule thinking that his binary content was Hebrew or whatever.

>> But that means that ISO-8859-X is also shut off, because
the
>> only no-conversion coding system is iso-8859-1-unix and its
>> aliases.
Hrvoje> In an iso-8859-2 locale, input bytes between 160 and 255
Hrvoje> should be considered Latin 2.
Mule doesn't work that way.

I know, but I'm saying it should. That's what people from those
locales expect, and the fact that it works totally different is the
reason they don't use Mule.

Again, if we are talking about "most users", you and Jamie
are in
the minority. Most people use Emacs as a human-readable-text editor

I don't agree with the "minority" assessment. In my experience,
people who use Emacs are very advanced and use it for programming and
all kinds of random editing. People who use shell mode are even more
"advanced".

They don't expect their text editor to be reliable in the face
of
binary data without special effort.

Maybe -- but they might expect their editor to at least respect the
locale. People running in a Latin 2 locale are simply not used to
their editors randomly garbling binary data.
True, you need to edit binary data very rarely, but when you do, you
*really* do. Emacs's ability to edit binary has saved my ass a number
of times, and I'm not the only one.

>>>>> "Hrvoje" == Hrvoje Niksic
<hniksic(a)xemacs.org&gt; writes:
Hrvoje> Except, last time I checked, XEmacs/Mule completely
Hrvoje> ignores the user's hints as to how he'd prefer to handle
Hrvoje> the "binary octets".
It still does. That's one of several reasons why --with-mule _still
defaults to_ *no*. Jamie chose to go with a vendor which doesn't offer
a non-mule build (maybe you have some pull with them ;-). He chooses
not to build XEmacs himself.

Does it have anything to do with the fact that RedHat 8 uses UTF-8 by
default ? For example, all my LC_* variables had en_UTF-8. I switched them
all using LC_ALL=C as
1. Most old Motif based applications like Acroread get confused.
2. man under XEmacs and on the command-line looked broken.
2. I could care less about non iso8859-1.
Maybe setting LC_ALL=C might help for some applications. Since I broke down
and compiled my own version of XEmacs with non-mule, I don't know if it
will help in Mule builds.
-kitty.

Krishnakumar> Does it have anything to do with the fact that
Krishnakumar> RedHat 8 uses UTF-8 by default ?
I don't think so. That might help confuse XEmacs, but Mule was
designed by people who (at the time) had religious opposition to
Unicode, and (with a fair amount of justification) didn't trust in
POSIX locales to do the right thing.
Mule is suppose to DTRT in any given buffer regardless of other
buffers or the global POSIX environment.
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.

>>>>> "Jamie" == Jamie Zawinski
<jwz(a)jwz.org&gt; writes:
Jamie> I'm guessing that what is going on is that some particular
Jamie> sequence of bytes is putting the shell into a
Jamie> Be-Crazy-Unicode mode or something,
Do you (or your vendor) build with Mule? [A report with M-x
report-emacs-bug (with a reasonably fresh net-utils package) tends to
help here, bandwidth be damned.]
I've seen similar lossage with *terms (I forget which flavor), but not
in XEmacs yet.

Xterm and RXVT derivatives both show this, with RXVT derivative seeming
to be easier to trigger it in from my highly unscientific memories.
RXVT probably shows it too, but I never used that.
Daniel
--
I never watch television because it's an ugly piece of furniture, gives off a
hideous light, and, besides, I'm against free entertainment.
-- John Waters

On Mon, 28 Oct 2002, Stephen J. Turnbull wrote:
>>>>>> "Jamie" == Jamie Zawinski <jwz(a)jwz.org&gt; writes:
>
> Jamie> I'm guessing that what is going on is that some particular
> Jamie> sequence of bytes is putting the shell into a
> Jamie> Be-Crazy-Unicode mode or something,
>
> Do you (or your vendor) build with Mule? [A report with M-x
> report-emacs-bug (with a reasonably fresh net-utils package) tends to
> help here, bandwidth be damned.]
>
> I've seen similar lossage with *terms (I forget which flavor), but not
> in XEmacs yet.
Xterm and RXVT derivatives both show this, with RXVT derivative seeming
to be easier to trigger it in from my highly unscientific memories.
RXVT probably shows it too, but I never used that.

I solve it by:
$ echo [ESC]c
This seems to reset the terminal. The [ESC] needs to be a literal
escape. If in "vi" mode, that's:
$ echo ^V[ESC]c
--
albert chin (china(a)thewrittenword.com)