order of diacriticals in Unicode?

charles_e
6.May.2008 2.36pm
charles_e's picture

I’m having to build a few “ccmp” glyphs for setting Kiowa. I seem to remember that the order of specifying diacriticals in Unicode is “inside out”, starting with the top.

OK, no problem, the name for an “oh” with a macron & and acute above is, for example,

uni006F03040301 (or shorter, uni01010301)

But when you add the macron below as well, is it preferable to work “inside out” regardless of position, so the glyph name would be

uni010103310301 (omacron, macronbelow combining, acute combining),

or take care of the top first, e.g.

uni010103010331 (omacron, acute combining, macronbelow combining)

I suppose at some level it doesn’t mater, but if there is a convention I’d like to follow it, because for all I know, either the text file or the PDf may be repurposed.

TIA

Charles



Tom Gewecke
8.May.2008 7.07am
Tom Gewecke's picture

I’m not sure how relevant it is to what you are doing, but you might want to have a look at p. 111-113 of:

http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf


twardoch
8.May.2008 7.35pm
twardoch's picture

The ā̱́ character can be expressed in a two main canonical Unicode forms. The Normalization Form D (NFD), which is achieved by the complete canonical decomposition of the string:
U+0061 U+0331 U+0304 U+0301
and the Normalization Form C (NFC), which is achieved by the complete canonical decomposition and a subsequent canonical composition of the string:
U+0101 U+0331 U+0301

When creating glyphnames for characters such as ā̱́ glyph, I recommend converting the Unicode sequence to the NFC form (on the Mac, you can use Unicode Checker for that), so a sensible glyphname would be uni010103310301.

Your ccmp code would be then:
sub amacron uni0331 acutecomb by uni010103310301;

However, since many application don’t perform Unicode normalization of the string, your ccmp code could also add code for other situations:
sub amacron acutecomb uni0331 by uni010103310301;
sub aacute uni0304 uni0331 by uni010103310301;
sub aacute uni0331 uni0304 by uni010103310301;
sub a acutecomb uni0304 uni0331 by uni010103310301;
sub a acutecomb uni0331 uni0304 by uni010103310301;
sub a uni0331 uni0304 acutecomb by uni010103310301;
sub a uni0331 acutecomb uni0304 by uni010103310301;
sub a uni0304 uni0331 acutecomb by uni010103310301;
sub a uni0304 acutecomb uni0331 by uni010103310301;

More on this:
http://groups.msn.com/FontLab/tipsandtricks.msnw?action=get_message&mvie...

More on NFC:
http://unicode.org/reports/tr15/

Regards,
Adam


aszszelp
9.May.2008 12.24am
aszszelp's picture

They are not critical at all, so let them be diacritics only. ;-) ((or diacritical marks in the full form)).