OpenType fonts - Glyph coverage strategies

piccic's picture

I don't know if it's an intelligent question, but I am trying to understand.
When you have to generate an OpenType font (OTF or TTF), is it better to consider the glyph coverage in terms of "Unicode Ranges" or "Codepages"?

charles ellertson's picture

Sounds like an intelligent question to me, but the answer will likely depend on your notion about the eventual use of your font(s). If they are for text, I'd say the thing to pay attention to is *languages* -- for example, there are many languages that use the Latin alphabet, or are commonly transliterated using the Latin alphabet. Slavishly following either a *codepage* or *Unicode range* formula will not give you the best coverage of languages.

For Latin-based languages, see

http://blogs.adobe.com/typblography/2008/08/extended_latin.html

Then there are the non-Latin languages . . .

dezcom's picture

Ciao Claudio,

You might also want to look at this link from Thomas Phiney as well:

http://blogs.adobe.com/typblography/2008/08/character_set_terms.html

ChrisL

piccic's picture

Many thanks for your considerations.
Yes, I was asking mostly for non-Latin, as I see more or less the languages are fully covered by the Latin Extended-A (Unicode range). In fact, as I did the Extended-A for an unreleased typeface, I see it covers most of "Adobe Latin" ranges described by Thomas Pinney (great resouce, thanks to him!)

I wish to consider different typefaces in specific ways. I do not intend to draw (at first) a large number of glyphs for a typeface which is not meant for big, multi-lingual editorial purposes, inversely I wish – if I decide to include a language – to make the face enough complete in its script as well (i.e. Greek with Politonic).

I guess it's hard to reach economic standards in glyph choices, but it helps a lot making all these considerations.

piccic's picture

Just found while you were posting, Chris… :=) Many thanks.

In fact, while looking, I still haven't understood properly the "Names mode" in FontLab.
Apperently it may include any kind of encoding (defined by the text files with suffix .enc), so you may use it to create handy sets for your own production, but Pinney says "a code page is an encoding", so are there encodings listed in the "Codepage" mode which are the same of the ones in "Names mode"?
This thing is not clear to me… :=(

Mark Simonson's picture

Names mode relates to the older Type 1 format, in which each glyph was given a name (but no code). The encoding was supplied by a separate look-up table in the font. When you're making OpenType or TrueType fonts, the Names mode doesn't have any effect on how the font is generated, but can still be a useful way to view a font during development, especially since you can make your own encodings.

Code pages and Unicode ranges are just different ways of slicing up the Unicode gamut. Code pages relate to different keyboard layouts for different regions and operating systems and often support several languages, while Unicode Ranges are subsets of Unicode specific to various languages and purposes.

The encodings you can select while in Names Mode are similar to code pages, but come from the various encodings which have been used for making Type 1 fonts.

Mark Simonson's picture

By the way, a lot of font designers like to start with the Macintosh Roman encoding in Names mode because it contains all the glyphs needed for the basic Latin Mac and Windows code pages.

piccic's picture

Enlightening, Mark, many thanks! I suspected something similar but it was not so clear to me.
Generally, while designing, I switch from Macintosh Roman to custom encodings I am creating, divided per areas (numerals, for example), to the Index mode.

My question comes from an indecision on considering more Code Pages or Unicode ranges instead.
As charles_e suggested, the main concern is the languages covered (at least for me), so the keyboard layouts should come as a consequence.

Mark, may I ask you if you use more the Unicode ranges? And, if you don't mind, could you tell me how much you covered with your new release of Metallophile? I would have asked Proxima Nova, but I prefer to consider a more "self-contained" family.

Mark Simonson's picture

I use pretty much the same Unicode coverage as Adobe does in its "Pro" fonts (but not Cyrillic or Greek as they do with a few). Basically, most Western and Eastern European Latin-based languages. I haven't published anything yet outside of Latin-based scripts.

You can see the entire character set of Metallophile in this PDF:

http://www.ms-studio.com/FontSales/pdf/MetallophileSp8.pdf

And Proxima Nova's here:

http://www.marksimonson.com/fontspecimens/ProximaNovaOverview.pdf

charles ellertson's picture

I set type rather than design typefaces. Over the years, we've made a fair bit of money when publishers needed a Latin-based language set, where no font had the needed characters. Adobe helps out, their EULA allows their fonts to be modified as needed.

The only non-Latin alphabets I worry about -- the only ones where it would be nice to have both the Latin & non-Latin in the same typeface -- are Greek and Cyrillic. By all means include polytonic Greek, all the while remembering that the classical Greek period lasted over a thousand years, & not all is covered in Unicode.

But you delude yourself to think that "most" of Latin is covered by Latin Extended A. One of the vowels with an ogonek is in Latin B, as are the characters needed for Yoruba & Pan Nigerian, and probably some other African Languages. Thomas Phinney has pointed out that Tagalog has a wide audience. Few of the Native American languages are covered with just Latin A, including Apache, Navajo, Kiowa, and Lakota. Some can't be covered with any of A, B, and extended additional, but can with with either a *mark* or *ccmp* feature. (I believe *mark* isn't currently supported by Adobe, but it is coming.)

Anyway, that's why I think it more important to cover languages than somewhat artificial codepoints.

piccic's picture

Many thanks, Mark. I think Metallophile (and maybe even Proxima) does not include most of Latin Extended-B and the additional glyphs mentioned by Charles. That's what I was thinking about, since – in the end – I have no specific language audience in mind (for Latin), so I am quite torn between the idea of keeping the face self-contained and the idea of having a more thorough Latin coverage.
It would be important to see which languages (of wider use) are covered by Latin Extended-B. May it be there's somewhere such a resource?

I spoke of codepeages merely to rationalize my design approach, since I am a mess and I always tend to lose myself in details (i.e. spending hours to design an accent), so I have to find the right balance considering both language coverage and the typeface artistic framework.

Jens Kutilek's picture

Like Charles said, one should base the decision which characters to include in a font rather on language support than on Unicode ranges.

There are some resources about languages (or rather orthographies) and their required characters:

http://www.eki.ee/letter/
http://www.evertype.com/alphabets/#1.2

But there's also a definition problem, because often it's not precisely clear which characters are really needed.

Take, for example, German: Ask a German school kid and he'll probably tell you the alphabet consists of 26 letters. But what about ä, ö, ü and ß? That makes it 29, and is widely agreed to be sufficient for German. The eki.ee site adds à and é, so then we can write »Café« and »à la carte«, as well as André and René, which are common given names in Germany. But still Mrs Hoëcker and Mr Ruëtz aren't happy because they cannot write their surnames, which are proper German names, no foreign names btw. For historic texts you might want to add ſ (long s).

Not to forget foreign names which might occur frequently depending on the kind of text that you are setting. Schools in Berlin would definitely need fonts for Turkish, Polish, Serbian, French, Croatian, Vietnamese (to name the biggest groups) to properly write the names of their pupils (non-Latin scripts would be transliterated) in German texts (actually they probably don't care).

piccic's picture

Many thanks, Jens, pages of great interest.
Any of your observations comes as precious material…

For historic texts you might want to add ſ (long s).
This comes always out as an "automatic" design, when I take the effort to design the eszett/German double s, which is one of my favorite glyphs.

dezcom's picture

Jens,

GREAT LINKS!!! Thanks!

ChrisL

k.l.'s picture

Jens -- really good considerations!

One remark about www.eki.ee: It is a brilliant collection, yet I remember that their database contains PUA codepoints which Adobe has used in some of their earlier fonts, like this one for M/m with circumflex. The site does acknowledge this (see "not an UCS character!" on the page I refer to) and one should take the warning seriously and think twice before using such a codepoint. (And compare e.g. with Adobe's current practice.)

piccic's picture

Now that we have Karsten I feel even more confident… :=)
What could happen if I use a Unicode value which is not standard?

This reminds me I have to post another question to see if I can try to set my own 'conventions' to assign Unicode values to non-Unicode glyphs (like the Small Capitals)…

Mark Simonson's picture

What could happen if I use a Unicode value which is not standard?

The results will be unpredictable if the user changes the text set in your font to a different font. (By unpredictable results, I mean the glyph that appears in place of your non-standard coded glyph will be unpredictable, not that the computer might blow up or suddenly transform into a bowl of petunias.)

cuttlefish's picture

This reminds me I have to post another question to see if I can try to set my own ’conventions’ to assign Unicode values to non-Unicode glyphs (like the Small Capitals)…

Short answer is: Of course you can. That's what the Private Use Area ranges are for.

That said there are some organizations that have laid (unofficial, non-binding) claim on parts of the PUA for special purposes (MUFI and Conscript being but a couple), as well as corporate assignments, like uniF8FF being used for the Apple logo.

It's been said many times on this site and elsewhere that you're not supposed to give unicode numbers to glyphs like small caps or ligatures and to address them solely through OpenType features, but some programs won't recognize those off-list characters without a unicode number.

guiyong's picture

I am new here with little experience. Can a person please tell me in short language how to understand this subject? I use Macintosh System 7 for graphical purpose.

Advancing gratitude.
Gui Yong

Mark Simonson's picture

I don't know if there is a simple way to explain it, but here is a link to a discussion in which that was attempted:

http://typophile.com/node/39726

System 7.0 (1991) predates the first release of the Unicode standard, but System 7.5 (1995) and later support Unicode via QuickDrawGX. However, not many applications supported QuickDrawGX (and therefore Unicode).

piccic's picture

Hi Gui, if your question becomes clearer (and I'd be curious to hear it) we may start a separate discussion, beside mine…

Roger S. Nelsson's picture

I did A LOT of research before deciding which glyphs to include in the reworked fonts on my site.
The result (based on -among other things- both eki and wiki information ;) is presented in this list of languages:

http://www.cheapprofonts.com/Languages.php

The character set I ended up with is made up by:
- the Mac + Windows standard set
+ Latin Extended-A
+ 8 glyphs from Latin Extended-B
+ 12 Unicoded glyphs outside of those ranges
+ 8 glyphs without Unicode values
(A simple rundown can be found in the downloadable PDF Catalog ;)
= 65 Latin-based languages...

Enough for my concept, but your mileage may vary ;)
So, yes, I think a list of supported languages makes more sense for an enduser than codepage/unicode block information...

piccic's picture

Many thanks Roger, much appreciated.
Besides, your initiative looks commendable. I'd have not thought of it…

Thomas Phinney's picture

Roger, would you care to comment on *why* you chose to support those particular languages?

Like many such charts, it seems to focus on European languages, even quite obscure ones, over equally easy to support languages from the third world, even when the latter are spoken by 100x as many people. For example,Tagalog (22M as first language, 90M total), or Yoruba (25M people).

Of course, it's easy to imagine one might sell more copies in Luxembourg than the Phillipines.

Regards,

T

Roger S. Nelsson's picture

Well, Thomas - my main concern was to not stray too far away from my knowledge area ;)
My goal is to make very useable multilingual displayfonts, and so I had to know how to properly make the glyphs.

I have previously done many customized fonts to cover northern and eastern european languages, so I felt very confident I could do proper letter and diacritic design for all Latin-1 and Latin Extended-A glyphs. Then I started looking into which languages would be covered by these glyphs, and found quite a few additional languages that would be covered by adding just a few additional composite glyphs. So I included those, too.

The reason I cover some obscure european languages is basically that their glyphs are covered by these two Unicode blocks and simple composited glyphs.
I don't know specifically the usage of e.g. Rhaeto-Romance, but my fonts have all the glyphs needed to write it, so I include it it my language list :)
The Ibreve/ibreve glyphs (from the Extended-A block) are included, although their usage apparently are for cyrillic languages...

Many african languages require modified letterforms (hooks and curves and loops and whatnot ;) and as I have no direct knowledge of how to properly design these (their history and how they are usually implemented) I chose to not include them. Because of their sometimes "awkward" letterforms they would also make kerning much more time-consuming, and all this extra time would work against my idea of low-cost fonts.
AND the Eng character used for african languages has a completely different design than the Saami Eng - I didn't want to muddle things up with that (as I live in Saami-speaking territory this is perhaps the most important reason for me :)

As for the asian coverage: well, I do have Tagalog covered but I may have it sorted wrong (under F for Filipino) - a result perhaps (as mentioned in the beginning) of not knowing enough about the language. But the glyphs themselves are pretty straight forward ;)
(I will probably have to move this entry to T for Tagalog and also call it Filipino/Pilipino ;)

Yoruba is a whole other kettle of fish: I did look into it, but the character set is full of non-Unicode encoded glyphs, and I found no conclusive information about their design and usage.
Same also for vietnamese: there would be too much additional work to learn and implement all the needed glyphs - all for covering a market I do not really know how to approach.

I might expand the language coverage for the fonts I rework later, though - but as a start I have focused on languages where I KNOW how to properly design the glyphs and at least have some idea how to approach the markets. Which I will do more actively when I have built up a large enough library of fonts. :)

My, what a long post! ;)

piccic's picture

In fact, Jens' links are particularly interesting if you worry about transliteration of non-Latin languages. Thanks, again! :=)

charles ellertson's picture

Covering African languages can indeed involve some problems; see

http://typophile.com/node/49307

Having said that, using the characters found in Unicode (and the composites that have to be made up) will cover the needs of many users of Yoruba and Hausa. We have run into authors that insist it must be this way -- say, a bar under a letter rather than a dot -- but the number diminishes as Unicode becomes more widely adopted. We recently had an author change her mind after first proof. We'd set bars under per her initial request, and she decided in first proof that the dot was becoming a standard, so we switched to that. Admittedly far easier for a typesetter to make these moves than a type designer.

But there will always be a move towards standardization and what is easy to achieve. For example, while Kiowa properly uses macrons below vowels, the use of the underscore is common, because it is easy. And one reason it is easy is because even with OpenType, few foundries fill out the combining diacriticals. Once that is done, I'd expect more & more people to use the macron below rather than the underline.

So one point might be to fill out the Unicode ranges for Combining Diacriticals and Spacing Modifiers -- Lord, even MS Word will use them, though not always place them correctly. But to cover a language with characters using multiple diacriticals not precomposed in Unicode requires ccmp or mark, and the latter is not universally supported.

piccic's picture

So one point might be to fill out the Unicode ranges for Combining Diacriticals and Spacing Modifiers — Lord, even MS Word will use them, though not always place them correctly. But to cover a language with characters using multiple diacriticals not precomposed in Unicode requires ccmp or mark, and the latter is not universally supported.

Is that a similar issue than the one we have in Biblic (and for poeatry) Hebrew?
I aske because I was talking of this with Israel in the Hebrew type area, and I still know almost nothing about "Combining Diacriticals"… :=(

Amazingly interesting remarks you make about publisher's choices. Here in Italy it's often a luxury if they worry to buy a license for setting Greek in a typeface matching latin. Even big publishers sometimes use some shareware Greek, or worse… :=(

Thank you! :=)

Thomas Phinney's picture

Ah, sorry I missed Tagalog there! And yes, doing African languages as a group gets challenging, but some of them are easy or at least easy-ish.

But it sounds like you did a lot of homework, and thanks for the explanation of how you came up with the character set. It's very interesting to me to understand how other folks have tackled the problem of character set definition....

Cheers,

T

Syndicate content Syndicate content