1) You say, "the Stacked Vertical Orientation (short name svo) property is intended to be used for vertical lines in those parts of the world where characters are mostly upright"

Which kind of language do you mean by "the world where characters are mostly upright" ?More specifically, may Japanese vertical writing be classified as "the world where characters are mostly upright" ?

2) You say, "this default determination is based on the most common use of a character, …"

What is the actual meaning of "the most common use" ? How do you determine it ? In other words, frequency in the actual publications, or the compatibility with the legacy applications ?

1) You say, "the Stacked Vertical Orientation (short name svo) property is intended to be used for vertical lines in those parts of the world where characters are mostly upright"

Which kind of language do you mean by "the world where characters are mostly upright" ?More specifically, may Japanese vertical writing be classified as "the world where characters are mostly upright" ?

From the perspective of CSS Writing Modes, SVO is used for putting "everything" upright, where "everything" is all characters for which it wouldn't be blatantly wrong. So I hope that UTR50 will be suitable for this purpose.

I think the more difficult question is the scope of MVO. I like Murakami-san's definition:

MurakamiShinyu wrote:

On the other hand, the Mixed Vertical Orientation should be defined based on the context where Western letters and digits are sideways.UTR#50 MVO cannot be compatible with legacy Shift JIS era's vertical orientation. Greek and Cyrillic letters are now sideways. If we define some ambiguous characters as U and some others as R it will make a confusion, difficult to know which characters are U or R, why ‰ is upright? why μ is sideways? why § is upright? etc. I want more simple policy: ambiguous characters which are often used with MVO=R letters or digits are also MVO=R, easy to understand when they are set sideways by default and whether specifying upright orientation is needed.

I like Murakami-san's definition too except one point which contradicts with Eric's definition.

Eric says

emuller wrote:

What a user will want most often. Existing publications are a good indication of that. Compatibility with legacy applications is a second order factor.

That makes all ambiguous characters to U. Setting them to R makes multi-lingual people happy, but it's not where "a user will want most often" and not "compatible with legacy applications." Majority of East Asians don't read multi-lingual documents in vertical flow.

Multi-lingual documents in vertical flow are hard to read in general, and that's why Japanese has established horizontal flow typography for Japanese a hundred years ago or so. There are some before that, and some even today, so I understand some people who work on multi-lingual typography like Murakami-san strongly wish it, but the number of such documents are decreasing, and I expect to decrease more in future.

So, I think Koji does not agree, but here are my thoughts on how to categorize:

For SVO

If it's at all sensible to put as U, then the codepoint should be U.

If it's only sensible to put as R, then the codepoint should be R.

Most characters, it's sensible to make them U. Some characters (mainly dashes and enclosing punctuation like brackets) it never makes sense to make them U, so they should be R.

For MVO

If the codepoint has definite East Asian usage and not much (if any) non-East Asian usage, it should follow East Asian usage patterns.

If the codepoint is rarely used in East Asian usage, but often used in non-East Asian contexts, it should follow non-East Asian patterns.

If the codepoint is conflicted between U and R, its orientation should follow from consistency with similar characters and/or characters with which it is most often used.

If the codepoint is conflicted, and consistency does not resolve, the MVO should be chosen to minimize brokenness.

If the codepoint is conflicted and no arguments can be made that bias one way or another, then it should follow East Asian patterns.

If the codepoint's usage is unknown, it should be R until we have further information.

Consistency is important to me, because it minimizes confusion for the reader, increases predictability for the author, and reduces the chances of blatantly inconsistent typesetting.

----

So, for example, IMO all halfwidth and fullwidth characters and CJK punctuation should follow EA usage patterns: even though prime quotation marks can be used in non EA languages, they are used frequently in CJK and it makes sense to follow their conventions.

So, for example, I don't mind that INTERROBANG is set upright, but I'm disturbed that INVERTED INTERROBANG is not consistent with it.

Math is an interesting and complicated consistency argument. EA usage puts equals signs sideways, and these are commonly used in vertical text, along with multiplication and addition signs, which are symmetrical. Digits and variable names (Latin and Greek letters) are also sideways by default. So are arrows, which are frequently used as (and in some cases combined with) relational operators. So from all directions, consistency pushes the math symbols to be sideways. Some math symbols are more often used upright in EA running text, but because mathematicians like to have various rotations of the same glyph mean different things, it's impossible to draw a clean line. Thus they all have to remain consistent. Since the most common ones are sideways, and since they are commonly used together with other MVO=R characters, they resolve to sideways.

From the consistency of math, digits, and letters follows the consistency of the mathematical letters (e.g. MATHEMATICAL SCRIPT SMALL L) as sideways. This spills over into the Letterlike Symbols block, where additional mathematical letters and symbols are also encoded. All of these, I feel strongly, should be consistent with each other: a subset of them shouldn't be pulled out and treated differently just because e.g. they happen to be encoded in JIS. That would make their orientation arbitrary and unpredictable. So even though EA might have a bias to U, for these conflicted characters, consistency resolves to sideways.

For symbols like ¶ and §, both sideways and upright would make sense, so they are conflicted. But I see no consistency arguments to make, so I would choose upright.

For the currency symbols, ‰, and ‱, they are used in similar ways, so should probably be consistent. And they are used with digits, so IMO should likewise be sideways.

For single curly quotes, they aren't very common in EA, and are much less common in vertical EA. But they are used also as apostrophes in Latin, where setting them upright alongside sideways letters would be severely broken. And although common Chinese fonts have vert glyphs, many Japanese fonts don't, meaning in many cases in practice, upright would not be a good option. So IMO they should be (as a set, for consistency), set sideways. This minimizes brokenness.

Meanwhile, double curly quotes are used often in EA, even though they are uncommon in vertical EA. Because they aren't used as apostrophes, the brokenness argument is not as strong. Consistency with single quotes argues for sideways. Consistency with prime quotes allows for upright. EA usage argues for upright. Font problems argue for sideways. There's no clear answer IMO, so I'm okay with either U or R, and therefore leaning towards U.

So I agree that "ambiguous" should be set to U :) but I check usage patterns, consistency, and brokenness first to see if the character can be resolved before deciding it is ambiguous and therefore U.

I disagree with the idea that the IPR symbols and symbols such as "care of", etc., which mean purely English abbreviations, should be UPRIGHT.They do not have any stably established "East Asian" typographical usages. The reality is just that they CAN be set UPRIGHT, and people have tried to use them in the UPRIGHT posture with Japanese words, but there have not been any good established way of composing and arranging those characters.

The situation where such characters are set UPRIGHT in vertical lines is similar to the situation where roman capitals are set vertically in the UPRIGHT posture for book spines, display boards, displays attached to buildings, etc., especially when the word is short enough.

I don't deny that IPR symbols, symbols such as "care of" cannot be used in the UPRIGHT posture, and there may exist many such cases.But the usage is not the most typical "East Asian" usage. We don't yet know what the best and typical "East Asian" or Japanese usage about these characters really is.Our experience is not sufficient to use them constantly in pure vertical, Japanese contexts, in other words.

So, IPR symbols and "care of" etc. should be rotated in vertical lines.

Interestingly, the "care of" character demonstrates the situation very well. No one knows when this character can be used in the UPRIGHT posture in vertical lines, except in the rarest self-referential context to explain the meaning of the abbreviated symbol in writing an English address.

Going on principles leads to an dead end. You start with "=", you extend to "a + b", and then you extend to all math, and then extend to symbols that are mostly used in math and then you extend to all symbols, and then everything but kanji and kana is sideways. Clearly not what we want.

IMHO, the "extend to all math" step is the one that should be dropped. There are far fewer uses of "all math" in vertical than uses of symbols, as symbols, which ought to be upright.

---

As for the "multi-lingual" documents: I suppose that everybody agrees that kana and kanji should be upright, and that we can sort out the East-Asian punctuation. Then, there are two batches of characters found with some frequency: A-Za-z0-9, and symbols.

If we had only one set of code points for A-Za-z, then I would definitely prefer U for those characters. The majority of their occurrences are in acronyms, and set U, fullwidth. Fortunately, we don't really have to choose, because we have the fullwidth code points and the majority of the U occurrences are fullwidth. This allows us to set non-fullwidth latin sideways, and takes care of the occasional English phrase or Western name. Same goes for 0-9, with the addition of TCY.

For the symbols, which are overwhelmingly used outside of math, I think their most common uses are U (I am willing to handle a few symbols differently, on a case by case basis). The boundary between "symbols used with some frequency in Japanese" and "other symbols" is of course very fuzzy, and changing with time. IMHO, to be safe, we should a priori treat all symbols as U.

Once we have that in place, we have essentially served the "every day" Japanese user, with very little need for override, and the rest does not matter much.

Math in vertical is such a rare occurrence that it's not worth worrying about, and an override will be perfectly adequate.

Though it is difficult to estimate and to identify the code point from rendered glyph correctly.

Current DTP operators are working to turn the direction of ASCII digits and Latin alphabets upright, with the function of DTP software such as InDesign. They are no more dependent on fullwidth variant.

"Existing publications are a good indication of that" and my estimate gives the conclusion that the direction of ASCII digits and upper case Latin alphabet shall be upright (MVO=U). Is there any misunderstanding ?

Regards,

Tokushige Kobayashi

Last edited by TKobayashi on Sat Jun 16, 2012 1:11 am, edited 1 time in total.

If we had only one set of code points for A-Za-z, then I would definitely prefer U for those characters. The majority of their occurrences are in acronyms, and set U, fullwidth.

I agree.

emuller wrote:

Fortunately, we don't really have to choose, because we have the fullwidth code points and the majority of the U occurrences are fullwidth. This allows us to set non-fullwidth latin sideways, and takes care of the occasional English phrase or Western name.

Many person may not be satisfied. Because mixed text string of ASCII Latin characters and its fullwidth variants will be quite confusing.

emuller wrote:

Same goes for 0-9, with the addition of TCY.

If you depend on fullwidth variant of 0-9, you must express YYYY/MM/DD in vertical text as follows:

Comment added on 17/06/2012: I do not recommend this notation. This is a bad example, but you will be forced to use this notation by MVO=R for ASCII digit on future Web etc. (End of addition on 17/06/2012)

Another example is in the case of numbered list in vertical writing, you must use fillwidth variant for 1-9, ASCII for 10- with tcy.

I hear that many printing companies and DTP operators normalize numeric chars within original text at first, then they start formatting jobs.

Regards,

Tokushige Kobayashi

Last edited by TKobayashi on Sat Jun 16, 2012 5:01 pm, edited 1 time in total.

I like Murakami-san's definition too except one point which contradicts with Eric's definition....So, if I replace all ambiguous to U, I agree.

emuller wrote:

If we had only one set of code points for A-Za-z, then I would definitely prefer U for those characters. The majority of their occurrences are in acronyms, and set U, fullwidth. Fortunately, we don't really have to choose, because we have the fullwidth code points and the majority of the U occurrences are fullwidth. This allows us to set non-fullwidth latin sideways, and takes care of the occasional English phrase or Western name. Same goes for 0-9, with the addition of TCY.

Now I understand more clearly what Eric and Koji want. That is same as Stacked Vertical Orientation except ASCII range. You would prefer U for Greek letters because there is only one set of code points. The current "mixed" draft (R for Greek etc.) seems a result of compromise between you and other people having different idea. The result will be a confused spec.

To avoid such confusion, I think the purposes of SVO and MVO should be redefined as follows:SVO ("everything" upright) - for Japanese usual vertical publications, English vertical marquee, etc.- a tailoring of SVO, sideways for ASCII and halfwidth kana range, for people like you who want Shift-JIS era's fullwidth/halfwidth treatment.MVO - defined based on the context where Western letters and digits are sideways.

No, as far as I know about vertical commercial printing job, creators normalize every ASCII digits and its full-width variants into ASCII digits at first. ASCII digits display sideways on DTP screen. Then creators rotate every digits upright, make TCY pairs and generate PDF. The rotation job is easy, because major DTP software offers the function.

emuller wrote:

And in horizontal as well, minus the tcy?

Nomally, we use ASCII digits in horizontal writing for office works and printed books.

In the case writer jams ASCII digits and its full-width variants, editor or creator will change all full-width variant into ASCII digits.

Going on principles leads to an dead end. You start with "=", you extend to "a + b", and then you extend to all math, and then extend to symbols that are mostly used in math and then you extend to all symbols, and then everything but kanji and kana is sideways. Clearly not what we want.

IMHO, the "extend to all math" step is the one that should be dropped. There are far fewer uses of "all math" in vertical than uses of symbols, as symbols, which ought to be upright.

+1

emuller wrote:

For the symbols, which are overwhelmingly used outside of math, I think their most common uses are U (I am willing to handle a few symbols differently, on a case by case basis). The boundary between "symbols used with some frequency in Japanese" and "other symbols" is of course very fuzzy, and changing with time. IMHO, to be safe, we should a priori treat all symbols as U.

Once we have that in place, we have essentially served the "every day" Japanese user, with very little need for override, and the rest does not matter much.

Can't agree more.

Maybe we could add another property, that is for users who wants Japanese snippets within Latin. It's quite easy to develop using EAW, and I suppose that suffices almost every requests we're seeing. We can then focus MVO for regular Japanese users.

This may not be as crazy idea as I originally thought. CSS wants everything not be broken, and that lead "upright" to be smart and lead to SVO. Why isn't then "sideways" smart? Rotating Han characters looks as broken as setting parenthesis upright.

Maybe we could add another property, that is for users who wants Japanese snippets within Latin. It's quite easy to develop using EAW, and I suppose that suffices almost every requests we're seeing. We can then focus MVO for regular Japanese users.

This may not be as crazy idea as I originally thought. CSS wants everything not be broken, and that lead "upright" to be smart and lead to SVO. Why isn't then "sideways" smart? Rotating Han characters looks as broken as setting parenthesis upright.

I agree that we may need another vertical orientation property. I propose "CVO" between SVO and (new) MVO.

Isn't it what people do today? And in horizontal as well, minus the tcy?

No. There are no serious Japanese typographers who set multiple Arabic number digits with their full-width glyphs, in whatever method the shapes and widths are implemented. Only junk, third-rate typographers incorrectly, casually, happen to make this error, due to their ignorance. Or, I have to admit that our users may make the same mistake, and that it may be due to the imperfection of our user education.

It is possible that a stand-alone single-digit full-width number may be used, if (1) used as a symbolic stand-alone number for each item in an itemized list, or (2) used purely as a single digit number to indicate a month or date, etc. that can be expressed in one digit only, and when two digits are used, ONLY WHEN the precisely "half-width" number glyphs, with which its two-digits can precisely fit the EM body, are available. The mixed style, full-width Arabic numbers and precisely half-width numbers, is possible only when the "precisely half-width" Arabic number glyphs are available, through GSUB or whatever, so that no number glyphs will exceed the EM body boundaries.

About (2), this is a widely accepted method. But I don't think this is not necessarily a good way of composing Arabic numbers in Japanese, because the two glyph styles (proportional and full-width) have to be mixed. It is clear that this is an inconsistency, and always produces aesthetically bad effects. Still, I cannot deny that this "mixed-widths-number" method has been widely accepted in the "ordinary" class of jobbing typography, mainly because it can increase the probability of each line's length fitting to the EM body-based grid, resulting in a least amount of necessary line compaction or expansion, if not always possible.

But no serious typographers compose multiple-digits numbers using full-width Arabic numbers, with only one exception in the world.

The only exception may be that in order to achieve the above-mentioned "full-width and purely half-width" numbers on InDesign. :-)

You first use the full-width numbers (encoded as full-width numbers), and only to the affected multiple-digits numbers, you apply the GSUB feature to replace the glyphs with the corresponding "precisely half-width glyphs". In this case, the full width coded characters are used only as an entry point for the glyph substitution to realize the "precisely half-width" number glyphs. As mentioned above, pure full-width Arabic number characters may be used only for one-digit numbers.

Still, it is clear that mixing the full-width and proportional number glyph styles is NOT elegant at all. So, most serious typographers don't want to use the easiest solution, and prefer using genuinely proportional numbers only. It is obvious this needs more careful spacing between characters before and after the number(s).

Anyway, what I can I recommend to you is, at least, is, in Japanese typographic contexts, not to compose multiple digits with full-width Arabic number glyphs. Also, in vertical lines, most numbers should be rewritten or trans-coded into their Chinese equivalent characters, with few exceptions such as cases where TCY is applied. But the TCY usage for one or two-digit Arabic numbers is also just one of many expedient tricks, and not a stably established Japanese typographic convention.

In most serious cases, the first principle of setting numbers in Japanese vertical lines is that they should be set in Chinese number characters, and when a unit name is appended, it should be written in katakana characters.

We should not ape bad, mediocre manners of typography, however infectious they are today.

Who is online

Users browsing this forum: No registered users and 1 guest

Quick-mod tools:

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum