order of diacriticals in Unicode?
I’m having to build a few “ccmp” glyphs for setting Kiowa. I seem to remember that the order of specifying diacriticals in Unicode is “inside out”, starting with the top.
OK, no problem, the name for an “oh” with a macron & and acute above is, for example,
uni006F03040301 (or shorter, uni01010301)
But when you add the macron below as well, is it preferable to work “inside out” regardless of position, so the glyph name would be
uni010103310301 (omacron, macronbelow combining, acute combining),
or take care of the top first, e.g.
uni010103010331 (omacron, acute combining, macronbelow combining)
I suppose at some level it doesn’t mater, but if there is a convention I’d like to follow it, because for all I know, either the text file or the PDf may be repurposed.
TIA
Charles




















8.May.2008 7.07am
I’m not sure how relevant it is to what you are doing, but you might want to have a look at p. 111-113 of:
http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf
8.May.2008 7.35pm
The ā̱́ character can be expressed in a two main canonical Unicode forms. The Normalization Form D (NFD), which is achieved by the complete canonical decomposition of the string:
U+0061 U+0331 U+0304 U+0301and the Normalization Form C (NFC), which is achieved by the complete canonical decomposition and a subsequent canonical composition of the string:
U+0101 U+0331 U+0301When creating glyphnames for characters such as ā̱́ glyph, I recommend converting the Unicode sequence to the NFC form (on the Mac, you can use Unicode Checker for that), so a sensible glyphname would be
uni010103310301.Your
ccmpcode would be then:sub amacron uni0331 acutecomb by uni010103310301;However, since many application don’t perform Unicode normalization of the string, your ccmp code could also add code for other situations:
sub amacron acutecomb uni0331 by uni010103310301;sub aacute uni0304 uni0331 by uni010103310301;
sub aacute uni0331 uni0304 by uni010103310301;
sub a acutecomb uni0304 uni0331 by uni010103310301;
sub a acutecomb uni0331 uni0304 by uni010103310301;
sub a uni0331 uni0304 acutecomb by uni010103310301;
sub a uni0331 acutecomb uni0304 by uni010103310301;
sub a uni0304 uni0331 acutecomb by uni010103310301;
sub a uni0304 acutecomb uni0331 by uni010103310301;
More on this:
http://groups.msn.com/FontLab/tipsandtricks.msnw?action=get_message&mvie...
More on NFC:
http://unicode.org/reports/tr15/
Regards,
Adam
9.May.2008 12.24am
They are not critical at all, so let them be diacritics only. ;-) ((or diacritical marks in the full form)).