What character set do you usually reach for as a default when you start a new font?

With Unicode and the demand for greater multilingual support, in general, a lot of character sets have been rendered obsolete (MacRoman or ANSI) or, perhaps, insufficient like ISO 8859-1 and Windows 1252 - to name just two "Western" sets.

As with any software, defaults matter a lot. For example: what would you recommend newbie type designer load up in their font editor when making their first font?

(I'm asking because I've been writing HTML web font test pages. And some of them test for compliance with the major character sets like WGL4 and Adobe Latin 4, to name two. Also, these pages take a different approach to testing than anything else currently available. Among other benefits, they are easier for type designers to insert into their work flow because they sit in your local hard drive's file system. No web server necessary. They will be offered as open source and free of charge. So the smarter you make me, the smarter I can make tools for you.)

Comments

For a newbie, I would not expect huge language coverage right away. First fonts might only be their native language and similar. I have by now made my own character sets which consist of all Latin except Vietnamese. I sometimes add Cyrillic and Greek but that is just me.

Creating you own encoding is the best solution, but first you must investigate about the signs that you are designing and try them in context. Also it's always good to ask people who uses that language if what you are doing is OK. This is a good website for that:http://diacritics.typo.cz/

@Richard Fink One of my biggest problems with designing type is getting done. I feel Vietnamese is the straw that breaks my camel's back. Also, vertical metrics can be a problem with stacked diacritics.

The Opentype spec has a codepage bitfield in the OS/2 table for type designer to indicate which languages the current fonts supports. The bitfield is 128-bit wide, and only 90-ish bits are assigned at the moment. Perhaps an equivalent question is to ask which bits you want to switch on, among them:

Why leave out Vietnamese? Because in AL-3, including Vietnamese is 90 characters. That is without any small caps or alternate forms or anything. The horn accent is a tricky one to get right, and it is attached rather than floating. So the cost/benefit for Vietnamese is tough.

The Opentype spec has a codepage bitfield in the OS/2 table for type designer to indicate which languages the current fonts supports.

Not quite. The codepage bitfield is for registering what legacy 8-bit codepages the font supports, which is then used by some software — notably RichEdit clients on Windows —to make guesses about font fallback situations. The legacy nature of these bits means that they're not useful for fonts that support scripts and languages that never had 8-bit codepages, and the heavy handed nature of some software relying on the bits means that some fonts lie about what codepages they support: for instance, people making fonts to support the Arabic language may claim Wndows CP 1256 support even if their font does not support the ASCII subset or Farsi and Urdu characters, simply because this is the only way to get the font to work in some software and not fall back to an Arabic system font.

The section of the spec you link to is the Unicode range bitfield. I can't remember at what version of Unicode this has been stuck for many years. There's semi-regular conversation about rev'ing the spec to define bits for blocks that have been added to Unicode since then, but it doesn't seem to be a high priority, and there's equally regular conversation about ignoring these bits. Apart from the large number of blocks not included, the Unicode range bits have no standard requirement for complete or minimal coverage of a block. Hence, it is left to font developers to decide whether two characters from the Greek block used as symbols in a Latin-only font constitutes support of the Greek block for Unicode range bit purposes. Font editing software tends to err on the side of inclusion, so even if only one character from a block is present the bit may be automatically set.

In the case of the codepage bits, in theory one should only claim support if a font supports the entire 8-bit codepage, although some fonts lie for the reasons discussed above. In the case of the Unicode range bits, anything between a single character and full block coverage may be indicated by the bit setting, and only a total absence of characters from a claimed block constitutes a clear error.

For font developers, codepage support may be a reasonable starting place, with the caveats that a) even the simplest Latin fonts these days will tend to support multiple codepages, and b) not everything that a user might want for a given language exists in codepages. So, for example, there are a few Cyrillic characters needed for European languages that were not included in the old 8-bit Windows Cyrillic codepage; they are included in WGL4, however. At least, though, codepage support is a solid technical target.

For font developers, supporting entire Unicode blocks can involve a huge amount of work creating often obscure and graphically complicated characters that are unsuitable for inclusion in a particular design and unlikely ever to be used. Unicode includes a large number of characters of historical and specialist use only. I've had to make quite a lot of fonts containing such characters because of the nature of my clients — software companies who need to be able to display any Unicode character that may ever occur in text, and specialist publishers who actually produce texts involving such characters. Unless you have such clients, it's simply a waste of your life to spend time creating glyphs for such characters.

These days: Some extra Latin, Greek, Extended Cyrillic, Vietnamese & Pan-Nigerian languages. I skip deprecated characters, historical characters and characters only used in Esperanto and Pinyin. I include combining accents except for those only used in IPA. I have nothing against IPA but my fonts tend to be display fonts rather than something that would be used in a textbook. I'd have to include a full, current IPA set for them to be of much use.

I used to fret about vertical metrics for Vietnamese but now I just stack 'em up. You can let those accents go past the emsquare rather than trying to squeeze them in. I can't imagine people use default leading when setting Vietnamese type anyway.

My Extended Cyrillic excludes historical characters.

0243 Ƀ is obscure but I include it because some people use it as a Bitcoin symbol.

I imagine many designers shy away from Vietnamese because of lower retail demand.

I used to think that way too. The presumption is that the users of the target language are the ones buying fonts. But I think it's all about making fonts for effective localization. Popular Android and iOS apps tend to be localized. If you're an app developer, you don't want to have to license a separate font for Greek or Vietnamese. You're going to look for a font that covers as much as possible.

@John Hudson I follow what you're saying but I'm confused about the remedy. Should the font lie about codepages or not lie about codepages. I usually run a font through DTLOtmaster's utility for accurately reporting what the font's got. Is that a good thing or a bad thing as far as backward compatibility?

Everybody's who's responded so far is an old hand at font making and everybody's got their own formula. Me too. But what I'd like to understand is why did you make the decisions you made? Ok. I understand that Vietnamese can be seen as a burden with next to no payoff for the effort. But is that evidence-based? Ray's extended Cyrillic includes historical characters. Now, I haven't yet worked through the Adobe Cyrillic set but does that set include historical characters that make their way into Thomas Phinney's fonts? (Don't know, but I'm guessing not.)

Michael Jarboe, what character set is "Extended Latin". Who defines that one? Not any major industry players that I know of. Did you put it together for yourself?

Web fonts - fonts packed up to travel over the network - are my main interest, as some of you know. And traveling light is good. The smaller the file size, the better. So I'm trying to put some of these characters on trial for their lives, so to speak. And also divide the characters into those that are more suitable for a print (desktop) font than for the web. I think Ray makes a great point about giving developers maximum coverage in a single font. But I'm wondering about the file sizes to achieve that. Do apps install the fonts they use upon installation? Or do they pull from the network when you open the app? I don't know. (If somebody could clue me in on that, I won't complain.)

I follow what you're saying but I'm confused about the remedy. Should the font lie about codepages or not lie about codepages.

Generally not, but, well, sometimes you gotta do what you gotta do.

For a long time, I told clients that if we're going to claim codepage support, then we have to actually support the codepage, and they were mostly okay with that because they understood there may be software dependencies. These days, where Unicode is the norm in most places and 8-bit codepage mapping not as critical as it used to be, some of the 'younger' clients — by which I mean companies like Google — seem perfectly happy to make non-Latin fonts without an ASCII subset. I've not checked the OS/2 tables in such fonts to see what codepage bits are set, but I wouldn't be surprised to find they're lying, and nor would I blame them.

Microsoft still ask us for at least one complete Windows legacy codepage in each font, even in fonts for scripts that are 'pure Unicode, i.e. that had no Windows 8-bit support. Typically, this means that every font we make for Microsoft supports at least CP 1252 (Win ANSI, Western Europe), which has led to a number of one-off Latin designs to accompany Ethiopic or Javanese or Arabic.

Vietnam’s per-capita income is $5,700 and it’s tied with Zimbabwe for first place in software piracy rates. I consider that ample evidence Vietnam is not an economy I should develop for unless somebody else pays up front.

There are about 25 million smartphone users in Vietnam. For an app developer, that could make localizing for Vietnamese, worthwhile. Even if the Vietnamese don't buy a lot of apps, that doesn't matter since apps are supported by advertising these days.

There are around 15 million Nigerian smartphone users and their many languages only require a few extra glyphs to support, some of which are included in the Vietnamese range.

Do apps install the fonts they use upon installation?

iOS and Android apps can use embedded TTF font data without installing. App developers can use a mix of embedded fonts and OS fonts for localization. For example, a developer might embed a Latin/Greek/Cyrillic font and fall back on OS fonts for Chinese and Japanese.

@Richard Fink I have seen somewhere (here? elsewhere?) that you are in favor of type designers developing fonts for the web first, and the graphic arts market second. I believe you wrote that a primary focus on the “graphic arts” market was outdated.

I can see merit in that statement, and also in what @Ray Larabie us writing about smart phone app developers as a target audience. But the majority of professional type designers and font developers in “the west” – as well as users on this sit – do primarily service the graphic design industry. While this industry is potentially smaller than the web or smart phone apps, it is a real industry with a decades-long tradition of licensing fonts. It spends enough money to keep many of us gainfully employed. It is also the industry that a lot of us came out of (even though @John Hudson has often written IIRC, for example, that he is not a graphic designer, and prefers to design for typographers who are not graphic designers). In other words, our focus should not be a surprise.

One reason that several font foundries do not include Vietnamese in their off-the-shelf fonts in that the potential addition of Vietnamese characters later on down the line represents a potential future revenue stream. Several years ago, I worked at one of the big old foundries, and we occasionally got orders to create custom fonts that supported Vietnamese.

Even in the libre font market, many designers hope that corporations will hire them to extend the fonts they have already published. Certainly I would add Vietnamese support to libre fonts I have designer, or even libre fonts other designers have produced, if a customer was willing to pay me the appropriate fee.

I suspect, also, that every designer does their own cost/benefit analysis. We are willing to work on a specific family of fonts for a certain number of months or years, but eventually, one needs to release the products. Fonts are never finished, no matter who publishes them. Both commercial font makers and libre designers update their fonts over time, usually based on market demands that grow over time.

@Richard Fink I have seen somewhere (here? elsewhere?) that you are in favor of type designers developing fonts for the web first, and the graphic arts market second. I believe you wrote that a primary focus on the “graphic arts” market was outdated.

Making fonts for the graphic arts is outdated in that people spend more time reading from screens than from paper. The growth of mobile phones - in developed countries, in undeveloped countries, everywhere - is so phenomenal that not acknowledging it and trying to tap into it is downright unbusinesslike. So, it's a matter of who you are trying to make happy with your font. Hopefully both end-uses get their due. Plus, there are still png's and jpg's to be made and the graphic arts aren't going extinct anytime soon. But printing is not a green industry, as cool as some of the products are, and that alone will cut into graphic arts applications as screens become cheaper and cheaper and more and more essential for succeeding in the world, and completely ubiquitous. A browser first, Photoshop/Illustrator second strategy might become the norm, it might not. I don't know. In the meantime I watch from the sidelines and take notes. And I prod because there's a lot of tunnel vision goin' on.James Montalbano - think you left out the word "easy" from your post, yes?

Font developers serve whomever pays, and the graphic arts market, even here in the future, still pays. The graphic arts market is, for the most part, also the people who create online media. Readers are not the font market. Readers don't pay us anything, at least directly.

Font developers serve whomever pays, and the graphic arts market, even here in the future, still pays. The graphic arts market is, for the most part, also the people who create online media. Readers are not the font market. Readers don't pay us anything, at least directly.

True enough, Mark. But please forgive me if I veer away because we (and I) went off topic. The topic is default character sets. Do you have one?

The consensus among those who've responded is that each has his or her own set worked out that's done the job and you add and subtract characters from that personal default as needed. As far as choice of characters - well, certainly language coverage plays a big part. As does client request or some other motivating factor.

I remain open to any input.... what I've gotten from this thread has helped me understand, so I thank those who have responded so far.

My default character set has grown in a haphazard way. It started out years ago with what Fontographer showed by default, which IIRC was a superset of the MacRoman and Windows Western Latin. Later I added what I saw Adobe including. And so on. It keeps growing. Currently I include all of Latin Extended A and a few things from Latin Extended B. When I update any of my back library fonts, I add any new Latin to more or less keep it all in sync. In some cases, I have added Vietnamese, Greek, and/or Cyrillic due to customer requests. Except for these, almost anything requested by a customer becomes part of my default set.

My own default char. set is SIAS-Lat-Eu-2 or SIAS-Lat-Eu-3. The latter also embraces Azeri and Vietnamese.

I wonder for a long time now, why there is no industry standard established for general char. sets, a matter so crucial. And I also wonder if any of the relevant academic institutions have ever been touched the foggiest possible way by the thought to do work on that matter.

Sorry Richard, that term isn't official, I have my own interpretation of what 'Extended Latin' is, and has become to me, and it's similar to many independent foundries. It's really no different than H&Co's Latin-X, or Commercial Type, or even Underware's default character set.

Like Mark's example, it includes all of Latin Extended A and a few things from Latin Extended B. And then adjusted accordingly depending on the typeface, considering the inclusion or exclusion of lining figures, oldstyle figures, small caps, standard and discretionary ligatures, numerators, denominators, superiors, inferiors, case sensitive forms, symbols, etc. it can change quite greatly. That is why I always create a custom encoding because there are so many variables and I can order all the glyphs in an organized way that makes sense to me.

I wonder for a long time now, why there is no industry standard established for general char. sets, a matter so crucial. And I also wonder if any of the relevant academic institutions have ever been touched the foggiest possible way by the thought to do work on that matter.

Well, until Unicode and, more importantly, near universal support for it in software, we had a Tower of Babel situation with competing "standards" being offered by different businesses and institutions. As history, they still exist. As practical guides, they are obsolete. First, any set today has to specify Unicode points. Second, the main criteria for choosing a character set must be the languages it supports. What else does anybody care about? (I'm not talking here about including small caps and alternates, etc. Just the basics needed to express yourself in a given language.) Beyond that, you get into symbols and emoji and the question there becomes: how often do people need those symbols? And in web browser's at least, remember there's a luxury that you don't have in the graphic arts: a stack of fallback fonts can be specified in the web page's style sheet that will, most probably, have the symbol you've omitted from your font. It might not match your font stylistically. The metrics from your font compared to the fallback font probably won't match, but the symbol will display.

I'm certainly not concerned about what characters folks like James, Mark or Ray or you, Andreas, put in a font. You know what you're aiming for. However, at least some of the folks who license those fonts are concerned about what languages are supported. On the web, especially, web fonts have really put "The World" into the "World Wide Web" and language support is an important thing to think about.

What I would like to see, beginning with Latin-based languages, are just a few character sets defined, based on Unicode, that progressively support a greater and greater number of languages. Based on the number of speakers and any other relevant factors. The Adobe Latin sets do that, to some degree, but Adobe does not consider them normative and thus "a standard" of any kind. (There's more to be said about the Adobe sets, but I'm not going into it here.)

Also - consider this - there actually are character sets defined in both the HTML4 recommendation from the W3C and also the HTML5 recommendation. The char list in the HTML5 is big and has a lot of symbols. But it IS part of a recognized industry "standard". And, in fact, all current browsers support that standard which maps the Uni points to "human friendly" names like &rarr; (for 'right pointing arrow') or &nbsp; (for non-breaking space).

Well, I'm on the job, Andreas. It will probably take me till the middle of this year to nail it all down. But at least for myself and my HTML test pages, I'll have character sets defined that make more sense than what we've got now. And they'll be on Github and you can use them or not use them as you please.

Microsoft still ask us for at least one complete Windows legacy codepage in each font, even in fonts for scripts that are 'pure Unicode, i.e. that had no Windows 8-bit support. Typically, this means that every font we make for Microsoft supports at least CP 1252 (Win ANSI, Western Europe), which has led to a number of one-off Latin designs to accompany Ethiopic or Javanese or Arabic.

I think that's a good idea for any font. English is the closest thing to a world language that we have. A lot of people don't know this, but all airline pilots speak English. English is the international language of commercial air travel, spoken by air traffic controllers and commercial airline pilots the world over. (Think about it, can you imagine if each airport spoke it's own local language?)

Also, I think Microsoft's model for supporting languages is a good model to follow for anybody. They got there first. I'm thinking of writing up a little study of how Microsoft goes about handling language support for Windows and Office, for its different constituencies. Just sayin'.

BTW, on the subject of how to indicate what scripts/languages a font explicitly supports, in Windows 10 Microsoft has adopted Apple's 'meta' font table with <dlng> and <slng> tags for 'design language(s)' — i.e. those for which the font is primarily intended — and 'supported languages'. This is exposed with a new API in Direct Write, with heuristic fallback to OS/2 bit arrays if the 'meta' table is absent. Microsoft's intention is to formalise this as an update to the OpenType specification.