Where can I learn more about how to add diacritics to Western fonts? (I need to create a CE version of an existing font).
Thank you very much!
Marius, as for the design of Polish glyphs, I can recommend my article published in the ATypI/Graphis book “Language Culture Type”: http://www.amazon.com/exec/obidos/ASIN/1932026010/twardoch-20/ Also refer to the Microsoft “Diacritics Design Standards” document: http://www.microsoft.com/typography/developers/fdsspec/diacritics.htm I recommend looking at the design of “Helvetica Linotype”, the *new* OpenType version of Helvetica (look at http://www.linotype.com/ ). The diacritics are quite good. Finally, consult people who know about such things Regards, Adam Twardoch
It would be interesting to deﬁne Eastern European as it relates to character sets too. They vary a lot. Is it accurate to include the entire Latin Extended A code table?
It should be noted that ‘Eastern European’ is a misnomer, since most of the languages that we’re discussing are properly considered Central European. Also, Microsoft have separate codepages for ‘Eastern European’ (CP 1250) and ‘Baltic’ (CP 1257), while Apple combines support for the Baltic languages with Central European languages in a single ‘MacOS CentralEurope’ codepage. Conversely, Windows CP 1250 supports Romanian and Croatian, while Apple has separate codepage for each of these. There is a small number of characters for Central, Baltic or South European languages that are not included in the Unicode Latin Extended A block. Here is a zip ﬁle that contains a FontLab .enc ﬁle that covers all the major languages of Europe, and most minority languages*, that use the Latin script. The zip also contains a .map ﬁle that maps from AGL1.0 and uniXXXX form names to correct Unicode characters; this can be used with the FontLab ‘Assign Unicode’ function. The .enc ﬁle should be placed in your FontLab\Encoding folder; the .map ﬁle should be placed in your Fontlab\Mapping folder.
Note that the .enc ﬁle is what we internally consider ‘Basic’, i.e. it does not contain smallcaps and glyphs for some other extended typographic features, but it does contain glyphs for case-sensitive punctuation, e.g. /hyphen.cap/ for use with the OpenType <case> feature. These can easily be removed from the .enc in a text editor, or simply ignored if not desired. Note also that this set contains support for some obsolete characters that are found in the Microsoft WGL4 set, e.g. /kgreenlandic/, which are included only for compatibility reasons. * I can’t claim total support for all European minority languages using the Latin script, because there are some about which I do not have adequate information, and others are in a state of orthographic uncertainty.
Just to add a little to John’s info, Adobe’s “CE” character set (part of all the Adobe “Pro” fonts) covers all the Windows and Mac codepages listed above, plus Turkish (yet another separate codepage). Cheers, T
Very helpful. I obtained the Tiro encoding with the Python stuﬀ Adam posted a few months ago but didn’t scroll all the way down to the page you mention John. Thanks
Can you point to that thread, please? Thank you, M.
I was referring to the Fontlab on Steroids stuﬀ. http://steroids.fontlab.net/
I’ve actually updated some of my FL resource ﬁles since Adam compiled the ﬁrst FontLab steroid. I don’t think the Latin Basic set is modiﬁed though, so the steroid version should correspond pretty closely to what I provided here today.
John, i’ve looked at your .ENC ﬁle, it looks identical to what i use and thought supported all european languages, the only diﬀerence is the presence of ‘aringacute’ in your ﬁle. I haven’t encountred this letter yet, which language uses it?
/aringacute/ is a Danish letter (along with the equally obscure with /aeacute/ and /oslashacute/). It is not often found in modern texts, other than dictionaries and grammars, because it was not available in the standard 8-bit character sets intended for Danish and many people got out of the habit of marking stress in Danish. Any Danish vowel can be marked as stressed, hence the need for these characters. I’m afraid I’m somewhat to blame for the Adobe Pro set not containing this character. Adobe asked me about the /aringacute/ when they were deﬁning their set, and I told them that it wasn’t used anymore except in dictionaries and grammars. Since the height of the uppercase form causes vertical metric problems, they decided not to include it, although my advice is now that it should be included because it is still an oﬃcial character in the Danish orthography. Also, I didn’t really realise that Adobe were asking only about the /aringacute/ and not about the other Danish diacritics /aeacute/ and /oslashacute/, so the Adobe Pro set contains the latter but not the /aringacutre/.
> Any Danish vowel can be marked as stressed ! I’ve read this about Dutch too. What other languages have this wonderful feature? Why doesn’t anybody talk about it? For one thing it makes italics much less relevant. Do people actually use it in handwriting? It’s so sad that technology has made a language less powerful. It wouldn’t be the ﬁrst time of course, but it’s still sad. hhp
Oh, one more important link: “Problems of diacritic design for Latin script text faces” by Victor Gaultney, http://www.sil.org/~gaultney/research.html#Dissertation Adam
Actually, we asked a lot of people (though obviously not enough or not the right ones). I guess this is something we should look at ﬁxing in the future. Cheers, T
Did you ask a decent Danish linguist? hhp
»Any Danish vowel can be marked as stressed > I’ve read this about Dutch too Not only vowels, but also some consonants: jacute for example. It indeed is used for stressing the word, same like italics. Most often it is used in Advertising and in expressive text, not in handwriting, as far as I know. In words like JIJ (you), one could put an acute on all characters.
Hrant: Probably not, but I don’t know. I wasn’t directly involved in that part of the character set deﬁnition. I am however responsible for the addition of the litre symbol to the standard Pro character set. T
For one thing it makes italics much less relevant. I mean stress in terms of syllable inﬂection, not articulatory stress such as might be indicated, for an entire word or phrase, using italics. Marking syllable inﬂection means indicating which syllable carries the stress: P
Peter, are you saying that the “j” can get an acute outside of an “ij” pair? Wow. What about other letters? > I mean stress in terms of syllable inﬂection Sure, the acute can’t set oﬀ things like book titles, but it’s still highly useful. > One could e
you mean r
Hrant writes: “I’ve read this about Dutch too. What other languages have this wonderful feature?” In grammar books (for foreigners) and in dictionaries for Russian, acute over vowels is used to indicate syllable stress (so-called “oodarenye”). In Russian, the syllable stress is quite arbitrary. For example, when a verb is declinated, the stress can move from the ﬁrst syllable in singular forms to last syllable in plural forms etc. The stress is very prominently pronounced: for example, an unstressed “o” sounds almost like a short “a” (something like shwa), while stressed “o” is a very round and clear “o”. Therefore, one needs to learn the proper stress for each word. When I was in school, we usually read Russian texts from textbooks where the stress was marked. Later, when we switched to reading unmarked normal texts (in newspapers etc.) it turned out to be a very diﬃcult task For comparison: in Polish, the syllable stress is almost exclusively at second-last syllable. There are relativerly few words, many of them of foreign origin, which have a diﬀerent stress (e.g. third-last syllable), and many people still put it the stress to second-last syllable in these words. This may be the reason why spoken Russian sounds quite lively while spoken Polish sounds rather dull (an opinion that I actally agree with Adam
I’m a little thrown oﬀ by your question… Did you mean what are the standards for adding diacritics? or what is the method for adding diacritics? or did you have a completly diﬀerent question in mind? if you’re concerned about standards, twardoch has posted some good articles that discuss the topic. if you’re interested more in the mechanics… i’d post more about that, but don’t wanna confuse you if that’s now what you’re asking. so please, clarify what kind of information you’re looking for.
Just please don’t make them too big — that’s often merely a naive reaction against the -very real- threats of globalization and jingoism. hhp
Conversely, don’t make them too small and light, especially not the Polish ogoneks.
Thanks everyone for the feedback. Some more details on what I am looking into: I need to add Romanian diacritics to some fonts with western encoding. I am using FontLab, and would be interested in the whole process: adding composites (I know how to add composites, but I want to get “professional skills”), codepages, conversion mapping, hinting etc.
Regarding Romanian: There is a ﬁddle with Romanian diacritics. Earlier versions of Unicode uniﬁed the Romanian S/s with ‘comma’ accent below to the Turkish S/s with cedilla (U+015E/U+015F). Later, Unicode disuniﬁed these, and separately encoded the S/s with ‘comma’ accent (U+0218/U+0219). However, the old Windows 8-bit codepage 1255 that covers Romanian, uses the older Unicode encoding. This means that in such fonts you should use the S/s with ‘comma’ accent glyph in the S/s with cedilla codepoints. If you are making a multilingual OT font that needs to support both Turkish and Romanian, and also provide backwards compatibility support with the old 8-bit codepage, you need to include a Localised Forms <locl> OT layout feature for the Romanian <rom> language system tag that maps the S/s with cedilla glyphs to the S/s with ‘comma’ accent glyphs. Note that this substitution is not supported in currently shipping apps, but will be supported in future versions of Windows, so it is a good idea to get it in your fonts now. Unicode also previously uniﬁed T/t with ‘comma’ accent and T/t with cedilla (U+0162/U+0163), but later disuniﬁed these and separately encoded T/e with ‘comma’ accent (U+021A/U+021B). However, I have yet to locate a single language that actually uses T/t with cedilla, so the best thing to do is to include only glyphs for T/t with ‘comma’ accent, and double encode them to both the T/t with ‘comma’ accent codepoints and the T/t with cedilla codepoints (the latter are used in the old 8-bit codepage). Hope all this makes sense
John, Thank you for your reply. I have heard of this issue with previous Unicode versions. Do you know any online resources where this subject is discussed mroe in-depth? I am still encountering trouble when experimenting Eastern European font creation in FontLab. Does FontLab 4.5 have “ﬁxed” codepages for Eastern European languages? Thank you.
FontLab 4.5 does not have ‘ﬁxed’ codepages. Codepages are standards deﬁned by national, or international standards bodies or by corporations. Windows CP 1250, for example, is deﬁned by Microsoft, and they have not updated it to map the new Unicode character codes, since they don’t like changing their codepages because of backwards compatibility issues. Since Microsoft’s operating system in now 100% Unicode, they are less concerned altogether with 8-bit codepages, and support them only as compatibility formats (e.g. if you open a Cp 1250 encoded document in Word on Windows 2000+, it will be saved as a Unicode encoded document).
So from a practical point of view what does Dutch & Danish propensity mean for Opentype? Unless I missed something many of the possible 'stressed' versions of glyphs are not in unicode yet.
Would it be a good idea to use a feature code like 'liga 'to refer an app to a jacute glyph if the user types acute & then j? Would you do that for all the letters? Are there other diacritics that you would need to do this for too?
You would use the 'ccmp' feature for this kind of substitution (but note that the combining acute should follow the base letter).
Alternatively, you could use GPOS mark positioning, which would be much more flexible. In this case, you would need to have a dotless j glyph in the font and use a 'calt' feature lookup to substitute this for the regular j whenever the letter is followed by a combining mark glyphs.