Implementing contextual mark positioning

ishamid's picture

Hi all,

Now there is no GPOS lookup called "contextual mark positioning", but that's an accurate description of the scenario I'm about to describe. Consider the following (using Latin as a model, though the context I'm using this is not Latin):

Consider a unicode point UniChar decomposed into a base glyph Base1 and a mark PrimaryMark1. So

UniChar => Base1 + PrimaryMark1

For example, the character atilda can be decomposed into an "a" and a "tilda".

The feature "ccmp" and a single lookup takes care of this, of course.

Now I want to add a secondary mark SecondaryMark1 to the base, say, a macron. Note that I am adding SecondaryMark1 to the base glyph, NOT to PrimaryMark1.

The feature "mark" and a single lookup also takes care of this.

I apply this to lots of base glyphs and it works fine, but...

For the particular case of Base1, when I apply SecondaryMark1 to Base1, I need to offset the anchor coordinates of PrimaryMark1 for more aesthetic positioning. I cannot use "mkmk" because the relation between PrimaryMark1 and SecondaryMark1 is not constant from base glyph to base glyph.

So I need a contextual positioning definition that says, if SecondaryMark1 is applied to Base1, offset PrimaryMark1 by (x1,y1).

Let's express this scenario in feature-file syntax:

# GSUB

# Base1.comp is the unicode point in the font
lookup compose_primarymarks {
lookupflag IgnoreMarks;
sub Base1.comp by Base1 PrimaryMark1;
} compose_primarymarks;

feature ccmp {
lookup compose_primarymarks;
} ccmp;

# GPOS

lookup primarymarks {
mark PrimaryMark1 ;
pos Base1 mark [PrimaryMark1 ];
} primarymarks;

lookup secondarymarks {
mark SecondaryMark1 ;
pos Base1 mark [SecondaryMark1 ];
} secondarymarks;

Now I need a rule such that, if Base1 is marked by a SecondaryMark1, then the anchor coordinates for Base1 in the primarymarks lookup must be offset by (x1,y1).

Opentype does have a Contextual Chain Positioning lookup but it is very obscure and I'm not sure it is designed or capable of dealing with this kind of situation (which deals with different anchor classes). I've spent many hours on this but cannot find a solution.

I could add alternate glyphs and use more GSUB lookups but that defeats the entire purpose and will lead to the glyph mess I wanted to avoid in the first place. The point is to be able to control, for a given character or character string, the interrelation of base glyph, primary mark, and secondary mark.

Put another way: If the string

Base1 PrimaryMark1 [SecondaryMark2 SecondaryMark3 SecondaryMark4 SecondaryMark5 SecondaryMark6]

is encountered, the primarymarks and secondarymarks lookups are invoked. But if

Base1 PrimaryMark1 [SecondaryMark1]

is encountered, then the anchor coordinates of Base1 in primarymarks are offset by (x1,y1).

There is a feature-file syntax for contextual chaining positioning, but this aspect of the syntax is a bit obscure for me:

http://www.adobe.com/devnet/opentype/afdko/topic_feature_file_syntax.htm...

NOTE: I am NOT worried at this point about whether or not Uniscribe or some other renderer supports this. I am only concerned about representing the above scenario in valid, opentype form. That is, How do I represent the above scenario in an OpenType font?

I hope the above is clear. I cannot use mkmk, and I want to avoid a multiplication of glyph alternates.

Thank you in advance

Best wishes
Idris

John Hudson's picture

I'm afraid I'm not familiar enough with AFDKO/FontLab syntax to advise on particular implementation, and in any case I understand that this syntax might be liable to change and that it does not currently work properly. I suspect you still need to do mark positioning using VOLT, not FontLab.

As I understand it, you are trying to produce a situation in which

base + primary mark + secondary mark

results in the primary mark being positioned independently of the secondary mark, i.e. with an independent anchor on the base, but capable of being contextually shifted in presence of the secondary mark.

I think this is not too difficult, since the secondary glyph can perform as a context for the primary glyph. This sort of thing happens a lot in Biblical Hebrew, which doesn't use any mkmk, only contextual mark lookups. [The more difficult situation is if the first mark needs to act as context for the second mark. Actually, I'm not sure at all how one would handle that.]

In VOLT, you would simply add a secondary lookup to your mark feature in which

| secondarymarks

provided the context for a using a different anchor for primary marks on the base. So you might have an initial, non-contextual lookup in which primary marks are positioned on the base using an anchor called e.g. 'primark1' and a second lookup, with the secondarymarks context, in which the primary marks are positiong on the base using an anchor called e.g. 'primark2' or some similar mnemonic name.

Does that make sense?

ishamid's picture

Hi John,

Thanks for the reply. I am actually using FontForge, not FontLab, for opentype programming, since, as we all know, adfdk is not quite there yet.***** I am actually amazed at the depth of FF's opentype support, even for mark classes (which the fea syntax does not support at all, let alone implement).

"As I understand it, you are trying to produce a situation in which

base + primary mark + secondary mark

results in the primary mark being positioned independently of the secondary mark, i.e. with an independent anchor on the base, but capable of being contextually shifted in presence of the secondary mark."

That's exactly it!

"[The more difficult situation is if the first mark needs to act as context for the second mark. Actually, I’m not sure at all how one would handle that.]"

I think that's largely avoidable. If it did come up, we could probably express it as the reverse (relativity).

"So you might have an initial, non-contextual lookup in which primary marks are positioned on the base using an anchor called e.g. ’primark1’ and a second lookup, with the secondarymarks context, in which the primary marks are positiong on the base using an anchor called e.g. ’primark2’ or some similar mnemonic name.

Does that make sense?"

Yes, I believe so. Is this contextual chaining positioning (GPOS lookup 8)?

In any case, I may need your help to walk me through this, at least the first time around. Does arabtype use this scenario at all?

I'll play with this some more and report back. If there is any font you can point me to that implements this, I'd like to know. Also, a friend/colleague mentioned that vietnamese (which distinguishes vowel letters and tone marks) could be a candidate for this kind of treatment.

Best wishes
Idris

***** I was very excited by the fea syntax but it is sad that Adobe is taking forever to a) complete and standardize the syntax; b) cover all of opentype (eg no mark classes); c) implement it!

John Hudson's picture

I have uploaded a very small demo font HERE. This font has VOLT source tables included, so you can open it in VOLT and see how the lookups are built.

Here is the output from the lookups, in the VOLT proofing tool:

The mark.basic lookup provides individual anchors for the mark1 and mark2 glyphs. The mark.contextual lookup provides a new anchor for the mark1 glyph when it is followed by the mark2 glyph.

Read Roberts's picture

John Hudson's suggestion is very clever. I have yet to come up with anything that works as well.

I don't understand why it is a more difficult case when the first mark would act as context for the second. I think you could cover this case with a second contextual positioning rule with the context reversed, and referencing the same MarkToBase rule.

The one problem is that this approach may be cumbersome when there starts to be significantly more than two mark attachment points on the base glyph. Since (I think) there is no restriction on the sequence in which mark glyphs of different classes appear in the text string, than means that you must provide enough contextual rules to cover all the possible glyph sequences that you need to match. For example, if the base glyph could have four different marks attached at once, then the primary and secondary marks in a text sequence need to match a rule when they are separated by anything from none to two mark glyphs of the other classes, and you would need six contextual rules to cover all the cases in order to treat any single pair of mark classes as a special case for the MarkToBase rule.. In AFDKO syntax, these six context matching rules would look like:
**********************************
pos mark1' mark2
pos mark1' [mark3 mark4] mark2
pos mark1' [mark3 mark4] [mark3 mark4] mark2
pos mark2 mark1'
pos mark2 [mark3 mark4] mark1'
pos mark2 [mark3 mark4] [mark3 mark4] mark1'
# do special case MarkToBase rule when any of the contexts above match.
# The special case sets a different base attachment point for
# glyph mark1, whenever glyph "mark2' is also attached to the base glyph.
**********************************

However, I can't think of a more concise way to get the desired result.

FYI, there is supposed to be a new version of the ADKO package coming out in a few months, which will add support for mark and attachment lookups. (See " http://www.adobeforums.com/webx/.3c05d2bd/0" for a preview of feature file syntax changes).

I'd also like to check that I understand in detail how the example font provided by John Hudson works. Please correct me if I am wrong about the following.

When I look at the font with the AFDKO program 'spot' , what I see is that the feature "mark" contains two lookups, which will be applied in order.

The first lookup is a MarkToBase type. There is only one rule in the lookup, and the rule positions both the first and second mark glyphs at the same attachment position on the base glyph, at (x = 400, y = -400)

The second is a chaining contextual position lookup. There is only one rule in the lookup, and it will be applied when the glyph sequence is "mark glyph 1", "mark glyph 2", and the current text position is at the "mark glyph 1". The actual positioning rule which gets applied at this position is a MarkToBase rule. This MarkToBase rule specifies a single attachment point on the base glyph at (x=140, y = -400) for the "mark glyph 1".

The way this works, as the layout program steps through the text string [base glyph, mark glyph 1, mark glyph 2], is as follows:

1) text position is at the base glyph, the first glyph in the text string "[base glyph, mark glyph 1, mark glyph 2], ".
Lookup 0 is processed, but the one rule does not match the current context, as the current glyph is not a mark glyph preceded by a base glyph.
Lookup 1 is processed, but the one rule does not match the current context. The current glyph is not "mark glyph 1' followed by 'mark glyph 2'.

2) text position is at "mark glyph 1"
Lookup 0 is processed. The MarkToBase rule matches (because "mark glyph 1" is a mark glyph and is preceded by the base glyph), and so "mark glyph 1" is moved to the base glyph attachment point at (x = 400, y = -400) relative to the base glyph origin.

Lookup 1 is processed. The context is satisfied for the one rule in this lookup ("mark glyph 1" is followed by "mark glyph 2") - and so the associated MarkToBase rule is processed. It also matches - the "mark 1 glyph" is preceded by the base glyph - and is applied. It says to attach the the "mark glyph 1" to the base glyph at (x=140, y = -400). This overrides the result of processing lookup 0.

3) text position is at "mark glyph 2".
Lookup 0 is processed. The markToBase rule matches ( "mark glyph 2" is a mark glyph, and is preceded by the base glyph, and the only intervening glyph is another mark glyph), and so "mark glyph 2" is moved to the base glyph attachment point at (x = 400, y = -400) relative to the base glyph origin.

Lookup 1 is processed, but the one rule does not match the current context. The current glyph is not "mark glyph 1".

For this to work, the lookup flag for lookup 0 must be set so that the rule processing will ignore all mark glyphs in the text sequence between the current glyph and the base glyph.

The one thing that I do not understand here is why the MarkToBase rule in Lookup 0 defines "mark glyph 1" and "mark glyph 2' to be two separate mark classes, which happen to attach at the same location on the base glyph. It would take slightly less space, and would be conceptually simpler, to define a single mark class which contains both of them. I understand that the glyph lists of mark classes cannot overlap within a MarkToBase subtable, nor within the mark class definitions that are referenced by any lookup flag field, but the mark classes defined in one subtable are completely independent of mark classes defined in other subtables, which in turn are independent of the set of mark class definitions referenced by any lookup flag.

John Hudson's picture

Read, re. your last query, this was a hastily made demo, and I hadn't considered that the two marks would have the same offset. I'd originally intended that they would have different offsets. As you say, if they are the same, they could share an anchor in this lookup.

Re. anticipating arbitrary mark order. This is a significant issue. I don't think it makes sense for font lookups to try to anticipate every possible, equivalent mark order. It makes more sense, in the OpenType philosophy, for the shaping engine to apply some form or mark order normalisation during display.

We ran into this issue when developing the SBL Hebrew font. In that case, we found that some of the technicaly possible mark orders were actually impossible to display correctly because of complex contextual mark interaction in Biblical Hebrew that relied on specific order being followed. The issue was further complicated by the fact that there are similar mistaken assumptions in the canonical combining class assignments for Hebrew mark characters, such that Unicode normalisation can also result in glyph strings that are difficult or impossible to display correctly with OpenType (and also collapsing of textually distinct character order, which is a bigger problem). The solution, worked out with Microsoft, SIL, Logos and other participants, was to add buffered character re-ordering to the Uniscribe Hebrew shaping engine as part of the display. In other words, Uniscribe takes a variety of possible input orders and performs a display-targeted normalisation that produces predictable ordering for font lookups to process. The Hebrew case is complicated by errors in Unicode; for other scripts the Unicode canonical combining classes might provide appropriate normalisation. Of course, this won't solve all possible ordering issues, because marks within the same canonical class will not be reordered and will interact typographically, so these need to be anticipated in font lookups or users need to be encouraged to adopt consistent mark ordering practices.

John Hudson's picture

Read: I don’t understand why it is a more difficult case when the first mark would act as context for the second. I think you could cover this case with a second contextual positioning rule with the context reversed, and referencing the same MarkToBase rule.

I'll have to experiment, but in a MarkToBase lookup, wouldn't a context string

mark1 |

(mark1 preceding context) imply mark1 preceding the base glyph, i.e.

mark1 BASE mark2

not

BASE mark1 mark2

?

Actually, I'm not sure such a context would have have effect at all, because if one is defining preceding context for a MarkToBase one seems to need the base glyph to be part of the context, e.g.

mark1 BASE |

ishamid's picture

Hi Read,

I want to spend some time with your comment and analysis and perhaps say more. In the meantime, here is John's method in fea syntax:

================================================
lookup markMarkPositioninginLatinlooku {
lookupflag 0;
mark \mark1 ;
pos \BASE mark [\mark1 ];
mark \mark2 ;
pos \BASE mark [\mark2 ];
} markMarkPositioninginLatinlooku;

lookup Marktobaseattachmentlookup2 {
lookupflag 0;
mark \mark1 ;
pos \BASE mark [\mark1 ];
} Marktobaseattachmentlookup2;

lookup pos_chain_marklatn_0 {
lookupflag 0;
sub [\mark1 ]' [\mark2 ] ;
} pos_chain_marklatn_0;

feature \mark {

script latn;
language dflt ;
lookup markMarkPositioninginLatinlooku;
lookup pos_chain_marklatn_0;
} \mark;
================================================

I find the use of the <'> to be rather alienating. Most of the feature-file syntax is rather easy to read and write except this part. It's very error-prone.

My own idea was to do a huge part of the opentype programming in fea syntax, then compile it in FontForge or, eventually, Fontlab. For some tasks (like choosing ancho-coordinates) a gui is much more useful, but for other tasks the fea file helps me see the overall structure clearly (and to do certain kinds of repetitive editing tasks). But the syntax for contextual chaining is a bit obscure, and the examples in the spec are rather unclear.

Could you, by way of example, explain the pos_chain_marklatn_0 lookup in english so I can understand better exactly what's going on? Is there another way to write this? Could there be a better way to syntactically express this?

Could we replace (or have the option of replacing) the <'> by something more verbose? perhaps even a begin-end syntax?

In any case I do look forward to the next iteration of the fea-file syntax language.

Best wishes
Idris

Syndicate content Syndicate content