Re: Hyphens in man pages are no longer hyphens

On Thu, Dec 31, 2009 at 12:50:05 -0700, Sverre Froyen wrote:
> For I while now I have been using "LC_CTYPE=en_US.UTF-8" in my .profile. I
> recently noticed that I can no longer copy and paste commands containing
> hyphens from a man page because the hyphens get formatted as "342 200 220"
> (from od -c /usr/share/man/cat1/send-pr.0). Unsetting LC_TYPE results in man
> pages containing regular hyphens. The strange thing, however, is that this
> used to work. With LC_TYPE set. Formatted man pages from a catman on 22
> Nov.
> contain normal hyphens whereas current as of a week ago produce the UTF-8
> specific hyphens. Does anyone know what changed? Is there a way to restore
> the old behavior?
In utf-8 mode (or in PS output) groff translates:
* unescaped - to \u2010 HYPHEN (or /hyphen glyph)
you get this in a command names like "send-pr" where unescaped '-'
becomes a hyphen.
* escaped \- to \u2212 MINUS SIGN (or /minus glyph)
you get this in options b/c mdoc Fl macro uses \- and before that it
was a customary practice to use escpaed \-X for options.
src/gnu/dist/groff/font/devutf8/NOTES says:
Character 0x002D has not been given a name because its Unicode name
HYPHEN-MINUS is so ambiguous that it is unusable for serious typographic
use.
so you cannot even refer ASCII '-' in utf-8 mode unless you modify
font files.
I don't have any PDF distiller handy to test what gonna happen if you
convert groff -Tps output to PDF and then try to copy-paste a command
example from the PDF document, but you'll probably get the same
problem with the /minus used for options (Adobe glyph lists says that
/hyphen is ASCII '-' \u002D).
Of course in copy-pastable command line examples we don't want
"serious typographic use", we want ASCII '-' for its literal character
value :), but there's a catch. Let's say you want to copy-paste
eval `ssh-agent -s`
but in the roff source the first '-' is plain (hyphen) and the second
is escaped (minus).
The only way to solve this properly as far as I can tell is to use
some special font for examples that are intended to be copy-pastable
in which both hyphen (-) and minus (\-) look the same and both are
represented by something that will get you ASCII '-' when copied. For
PS output that could be a special alias for Courier that uses /hyphen
for both - and \-. For utf-8 it would use ASCII '-' for both (instead
of fancy unicode chars).
PS: The back-tick has the same problems too, as it ends up as \u2018
LEFT SINGLE QUOTATION MARK :)
SY, Uwe
--
uwe%stderr.spb.ru@localhost | Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/ | Ist zu Grunde gehen