Building a CE font (adding diacritics)

Marius Ursache's picture

Hi,

Where can I learn more about how to add diacritics to Western fonts? (I need to create a CE version of an existing font).

Thank you very much!

twardoch's picture

Marius,

as for the design of Polish glyphs, I can recommend my article published in the ATypI/Graphis book "Language Culture Type": http://www.amazon.com/exec/obidos/ASIN/1932026010/twardoch-20/

Also refer to the Microsoft "Diacritics Design Standards" document: http://www.microsoft.com/typography/developers/fdsspec/diacritics.htm

I recommend looking at the design of "Helvetica Linotype", the *new* OpenType version of Helvetica (look at http://www.linotype.com/ ). The diacritics are quite good.

Finally, consult people who know about such things :-)

Regards,
Adam Twardoch

twardoch's picture

Oh, one more important link: "Problems of diacritic design for Latin script text faces" by Victor Gaultney, http://www.sil.org/~gaultney/research.html#Dissertation
Adam

pablohoney77's picture

I'm a little thrown off by your question... Did you mean what are the standards for adding diacritics? or what is the method for adding diacritics? or did you have a completly different question in mind? if you're concerned about standards, twardoch has posted some good articles that discuss the topic. if you're interested more in the mechanics... i'd post more about that, but don't wanna confuse you if that's now what you're asking. so please, clarify what kind of information you're looking for.

hrant's picture

Just please don't make them too big - that's often merely a naive reaction against the -very real- threats of globalization and jingoism.

hhp

John Hudson's picture

Conversely, don't make them too small and light, especially not the Polish ogoneks.

Marius Ursache's picture

Thanks everyone for the feedback. Some more details on what I am looking into:

I need to add Romanian diacritics to some fonts with western encoding. I am using FontLab, and would be interested in the whole process: adding composites (I know how to add composites, but I want to get "professional skills"), codepages, conversion mapping, hinting etc.

John Hudson's picture

Regarding Romanian:

There is a fiddle with Romanian diacritics. Earlier versions of Unicode unified the Romanian S/s with 'comma' accent below to the Turkish S/s with cedilla (U+015E/U+015F). Later, Unicode disunified these, and separately encoded the S/s with 'comma' accent (U+0218/U+0219). However, the old Windows 8-bit codepage 1255 that covers Romanian, uses the older Unicode encoding. This means that in such fonts you should use the S/s with 'comma' accent glyph in the S/s with cedilla codepoints.

If you are making a multilingual OT font that needs to support both Turkish and Romanian, and also provide backwards compatibility support with the old 8-bit codepage, you need to include a Localised Forms <locl> OT layout feature for the Romanian <rom> language system tag that maps the S/s with cedilla glyphs to the S/s with 'comma' accent glyphs. Note that this substitution is not supported in currently shipping apps, but will be supported in future versions of Windows, so it is a good idea to get it in your fonts now.

Unicode also previously unified T/t with 'comma' accent and T/t with cedilla (U+0162/U+0163), but later disunified these and separately encoded T/e with 'comma' accent (U+021A/U+021B). However, I have yet to locate a single language that actually uses T/t with cedilla, so the best thing to do is to include only glyphs for T/t with 'comma' accent, and double encode them to both the T/t with 'comma' accent codepoints and the T/t with cedilla codepoints (the latter are used in the old 8-bit codepage).

Hope all this makes sense :-)

Marius Ursache's picture

John,

Thank you for your reply. I have heard of this issue with previous Unicode versions. Do you know any online resources where this subject is discussed mroe in-depth? I am still encountering trouble when experimenting Eastern European font creation in FontLab. Does FontLab 4.5 have "fixed" codepages for Eastern European languages?

Thank you.

John Hudson's picture

FontLab 4.5 does not have 'fixed' codepages. Codepages are standards defined by national, or international standards bodies or by corporations. Windows CP 1250, for example, is defined by Microsoft, and they have not updated it to map the new Unicode character codes, since they don't like changing their codepages because of backwards compatibility issues. Since Microsoft's operating system in now 100% Unicode, they are less concerned altogether with 8-bit codepages, and support them only as compatibility formats (e.g. if you open a Cp 1250 encoded document in Word on Windows 2000+, it will be saved as a Unicode encoded document).

eolson's picture

It would be interesting to define Eastern European as it relates
to character sets too. They vary a lot. Is it accurate to include the
entire Latin Extended A code table?

John Hudson's picture

It should be noted that 'Eastern European' is a misnomer, since most of the languages that we're discussing are properly considered Central European. Also, Microsoft have separate codepages for 'Eastern European' (CP 1250) and 'Baltic' (CP 1257), while Apple combines support for the Baltic languages with Central European languages in a single 'MacOS CentralEurope' codepage. Conversely, Windows CP 1250 supports Romanian and Croatian, while Apple has separate codepage for each of these.

There is a small number of characters for Central, Baltic or South European languages that are not included in the Unicode Latin Extended A block. Here is a zip file that contains a FontLab .enc file that covers all the major languages of Europe, and most minority languages*, that use the Latin script. The zip also contains a .map file that maps from AGL1.0 and uniXXXX form names to correct Unicode characters; this can be used with the FontLab 'Assign Unicode' function. The .enc file should be placed in your FontLab\Encoding folder; the .map file should be placed in your Fontlab\Mapping folder.


application/x-zip-compressedTiro Euro Latin Basic .enc and .map
EuroLatinEnc+Map.zip (177.9 k)



Note that the .enc file is what we internally consider 'Basic', i.e. it does not contain smallcaps and glyphs for some other extended typographic features, but it does contain glyphs for case-sensitive punctuation, e.g. /hyphen.cap/ for use with the OpenType <case> feature. These can easily be removed from the .enc in a text editor, or simply ignored if not desired. Note also that this set contains support for some obsolete characters that are found in the Microsoft WGL4 set, e.g. /kgreenlandic/, which are included only for compatibility reasons.

* I can't claim total support for all European minority languages using the Latin script, because there are some about which I do not have adequate information, and others are in a state of orthographic uncertainty.

Thomas Phinney's picture

Just to add a little to John's info, Adobe's "CE" character set (part of all the Adobe "Pro" fonts) covers all the Windows and Mac codepages listed above, plus Turkish (yet another separate codepage).

Cheers,

T

eolson's picture

Very helpful.
I obtained the Tiro encoding with the Python stuff
Adam posted a few months ago but didn't scroll all the
way down to the page you mention John.
Thanks

Marius Ursache's picture

Can you point to that thread, please?

Thank you,

M.

eolson's picture

I was referring to the Fontlab on Steroids stuff.
http://steroids.fontlab.net/

John Hudson's picture

I've actually updated some of my FL resource files since Adam compiled the first FontLab steroid. I don't think the Latin Basic set is modified though, so the steroid version should correspond pretty closely to what I provided here today.

peter bilak's picture

John, i've looked at your .ENC file, it looks identical to what i use and thought supported all european languages, the only difference is the presence of 'aringacute' in your file. I haven't encountred this letter yet, which language uses it?

John Hudson's picture

/aringacute/ is a Danish letter (along with the equally obscure with /aeacute/ and /oslashacute/). It is not often found in modern texts, other than dictionaries and grammars, because it was not available in the standard 8-bit character sets intended for Danish and many people got out of the habit of marking stress in Danish. Any Danish vowel can be marked as stressed, hence the need for these characters.

I'm afraid I'm somewhat to blame for the Adobe Pro set not containing this character. Adobe asked me about the /aringacute/ when they were defining their set, and I told them that it wasn't used anymore except in dictionaries and grammars. Since the height of the uppercase form causes vertical metric problems, they decided not to include it, although my advice is now that it should be included because it is still an official character in the Danish orthography. Also, I didn't really realise that Adobe were asking only about the /aringacute/ and not about the other Danish diacritics /aeacute/ and /oslashacute/, so the Adobe Pro set contains the latter but not the /aringacutre/.

hrant's picture

> Any Danish vowel can be marked as stressed

!
I've read this about Dutch too. What other languages have this wonderful feature? Why doesn't anybody talk about it? For one thing it makes italics much less relevant. Do people actually use it in handwriting? It's so sad that technology has made a language less powerful. It wouldn't be the first time of course, but it's still sad.

hhp

Thomas Phinney's picture

Actually, we asked a lot of people (though obviously not enough or not the right ones). I guess this is something we should look at fixing in the future.

Cheers,

T

hrant's picture

Did you ask a decent Danish linguist?

hhp

peter bilak's picture

>>Any Danish vowel can be marked as stressed > I've read this about Dutch too

Not only vowels, but also some consonants: jacute for example. It indeed is used for stressing the word, same like italics. Most often it is used in Advertising and in expressive text, not in handwriting, as far as I know. In words like JIJ (you), one could put an acute on all characters.

Thomas Phinney's picture

Hrant: Probably not, but I don't know. I wasn't directly involved in that part of the character set definition. I am however responsible for the addition of the litre symbol to the standard Pro character set.

T

John Hudson's picture

For one thing it makes italics much less relevant.

I mean stress in terms of syllable inflection, not articulatory stress such as might be indicated, for an entire word or phrase, using italics. Marking syllable inflection means indicating which syllable carries the stress: P

hrant's picture

Peter, are you saying that the "j" can get an acute outside of an "ij" pair? Wow. What about other letters?

> I mean stress in terms of syllable inflection

Sure, the acute can't set off things like book titles, but it's still highly useful.

> One could e

pablohoney77's picture

you mean r

hrant's picture

Sdiboghag

twardoch's picture

Hrant writes: "I've read this about Dutch too. What other languages have this wonderful feature?"

In grammar books (for foreigners) and in dictionaries for Russian, acute over vowels is used to indicate syllable stress (so-called "oodarenye"). In Russian, the syllable stress is quite arbitrary. For example, when a verb is declinated, the stress can move from the first syllable in singular forms to last syllable in plural forms etc. The stress is very prominently pronounced: for example, an unstressed "o" sounds almost like a short "a" (something like shwa), while stressed "o" is a very round and clear "o". Therefore, one needs to learn the proper stress for each word. When I was in school, we usually read Russian texts from textbooks where the stress was marked. Later, when we switched to reading unmarked normal texts (in newspapers etc.) it turned out to be a very difficult task :-)

For comparison: in Polish, the syllable stress is almost exclusively at second-last syllable. There are relativerly few words, many of them of foreign origin, which have a different stress (e.g. third-last syllable), and many people still put it the stress to second-last syllable in these words.

This may be the reason why spoken Russian sounds quite lively while spoken Polish sounds rather dull (an opinion that I actally agree with :-)

Adam

ebensorkin's picture

So from a practical point of view what does Dutch & Danish propensity mean for Opentype? Unless I missed something many of the possible 'stressed' versions of glyphs are not in unicode yet.

Would it be a good idea to use a feature code like 'liga 'to refer an app to a jacute glyph if the user types acute & then j? Would you do that for all the letters? Are there other diacritics that you would need to do this for too?

John Hudson's picture

You would use the 'ccmp' feature for this kind of substitution (but note that the combining acute should follow the base letter).

Alternatively, you could use GPOS mark positioning, which would be much more flexible. In this case, you would need to have a dotless j glyph in the font and use a 'calt' feature lookup to substitute this for the regular j whenever the letter is followed by a combining mark glyphs.

Syndicate content Syndicate content