When programmatically generating LaTeX from other sources, how do you
deal with some of the unique features of LaTeX, such as distinguishing
inter-word and inter-sentence spacing, and hyphen-minus versus en dash
and em dash?
E.G.- LaTeX typically uses a larger space between sentences than between
words, so we indicate when a period does not indicate the end of a
sentence by writing,
"Fruits, vegetables, etc.\ are on sale today." and since initials are
assumed not to end a sentence we write,
"He was staying in apartment B\@." when it does. Similarly, in writing
plain text it is common to use a hyphen-minus for a variety of things
that actually have different representations in good formal typography.
Ranges of numbers, for instance, are properly written with an en dash,
so in LaTeX we write, "1982--2013" rather than, "1982-2013".
Some of these things seem like they would not be easy to determine
programmatically what is the appropriate representation (which is
probably why LaTeX requires explicit indications for some of these
things while it does other things, such as typographical ligatures,
automatically). For instance, just because a hyphen-minus comes between
two numbers does not mean it should be an en dash. Telephone numbers are
commonly separated by a hyphen, never an en dash. Of course, depending
on context, a number like 457-1024 could be a phone number, or a range
of numbers, etc.
It seems to me that many of these things could be solved by using
unicode with proper characters for various dashes, accented characters,
etc in whatever source we are processing and then using XeTeX. However,
this still wouldn't solve some things, like the difference in spacing
between words and sentences.
What are the usual or recommended ways of handling this? I'm sure it is
something that has been addressed before as I have heard many times of
LaTeX being generated programmatically to generate PDF reports, by Emacs
calendar and org-mode, etc.