Microsoft Word and Internet Explorer has a ligature display bug

Internet Explorer and MS Word understand ligatures but a bug is causing them to display spaced out. IE10 displays ligatures correctly when new and degrades after updates. See the problem explained here

BACKGROUND
My font is a very successful orthographic font. It is the first ever-such font made for the writing system of a complex script (Singhala). Users love it because only three keys differ on the key layout used for it from the US English keyboard and all shifted key positions are intuitive to remember. The first version was finished in 2004.

At that time, it could be used only in Windows Notepad and SIL.org's Worldpad for Windows systems. Apple and Adobe products displayed it with no trouble since its inception. It now has 2500 ligatures. (Yes, thousands). This scared people like Mozilla who allowed it in normal sizes only inside Thunderbird email client. They thought the computer would grind to a halt trying to make thousands of ligatures on a web page. This is one of the myths about ligatures.

By 2008 Applications in Windows, Macintosh and Linux systems showed the font correctly except inside Internet Explorer and MS Word. So, I asked the users to use only Safari, Firefox and Chrome browsers and AbiWord and Gnumeric as alternatives for Word and Excel. (Mozilla later reported that the speed difference between my font and system default font is 2%).

See a web site using it: here. Copy some text in it to a text editor and see the underlying Latin text. This is really romanized Singhala. Think of it like a Fraktur font but many, many times the number of ligatures and shapes gone completely crazy.

When IE9 and MS Office 2007 came out, its ligatures showed but weirdly, each ligature was attached with a trailing gap.

THE DISCOVERY OF THE BUG
This week I casually tried a test page on my daughter's Notebook and its IE10 showed the test page perfectly. I quickly changed programming on that web site to allow IE10 to show the pages in the font (earlier I allowed only romanized text). Then I checked the font inside Word and the results were still the disappointing spaced-out ligatures. When I opened the Font setting dialog, I discovered the bug that has been causing the trouble. The preview shows the correctly formed ligatures but the settings never get taken by the document.

Next day I turned on automatic updates on the Notepad and the ligatures in IE changed to ones with gaps after a massing update. The following day another update reduced the text down to bare letters with no ligatures at all. Surprisingly and happily, Excel 2010 has no problem showing the ligatures!

This page at Microsoft web site is confusing language and script in its description of Standard Ligatures. Could it be that IE and Office developers are also ambiguous about Standard ligatures?

Please read the following page where I describe the case:
Ligature bug in Word and IE

John Hudson's picture

To clarify that I understand correctly: you've made a font that represents combinations of Latin characters using Sinhala glyphs. Why?

Michel Boyer's picture

The code points used belong to the Basic Latin and Latin-1 Supplement Unicode code blocks. The text typed in it would gracefully fall back to romanized Singhala in case the reader does not have this font.

Interesting!

ahangama's picture

Thank you for giving the issue your attention.

First, I want to get your mind cleared of any unintentional prejudice you may have.

Question:
Why do people have freedom to make fonts that represent combinations of Latin characters using Fraktur and Gaelic glyphs?

Possible answers:
They too are 'Latin' shapes; they do not have their own code pages; they are European languages; they are not as radically different from generally perceived Latin text though Latin Alphabet has only capitals and only 22 letters.

Here's the long answer to your question:
First, I cannot personally launch another Korean Mess for Singhala. The officials in Sri Lanka are personally benefiting from the Unicode Sinhala implementation. I live in US and went back there (after 15 years) and found them understandably hardened on opposition.

There is no earthly reason for Singhala to have its own code page. (May be for ET when we annihilate ourselves). Besides, the benefits of romanizing far, far outweighs the lame objections.

It is not compatible with Sanskrit and Pali because it takes two scipt-ligatures (not font ligatures) as base letters: U+0DA5 and U+0DA5. -- This issue completely invalidates the code page. I have counted 22 such ligatures. There are three other contracted letter shapes that combine iteratively with consonants.

Singhala is spoken and written intermixed with Sanskrit. All Sanskrit words are subject to orthographic ligatures. Although you can type some, you cannot do meaningful text processing in it. When it comes to complex Sanskrit language ligatures, many cannot be constructed. This is like an electric typewriter with dead keys enhanced for the computer better functaionality and impossible to use keylayouts. Unicode constrains you to its limitations. We are in the digital age! The era of letterpress printing is over when we had to adjust to what the lead type limited you to. I was a printer 40 years ago ad know this.

The consonant shapes shown are not of consonants but Consonant plus 'a' vowel. (ka, kha, ga, gha etc.). The grammar (vyAkaraNa) tells clearly what is a vowel and what is a consonant and what is a phoneme and what is a letter shape. Just because the consonant carries a vowel indicater flag on it does not make it a composite letter. (i with a dot is not a composite).

Each vowel, long and short has two code points, one for the stand alone shape and another for combining marks ('sign'). This makes text processing extremely hard to program if not impossible (I think the latter). This is not a language solution, it is for typing.

Check out this page and see how efficient romanized Singhala is: ( read the JavaScript in the coding):
Some conversion functions

I have many things done with this. One is this project to dig out Pali and Polish form www.metta.lk web site maintained by a Danish Buddhist monk. He did a monumental work of digitizing ancient Pali, but with hack fonts. You may call my Singhala font one such too but the romanized Singhala is the thing that is doing the magic:
Tripitaka demo

Last November I discovered moths eating up books in Colombo museum. (They did not all0w me to see books in the Archive).

ahangama's picture

Thank you Michel.

You can almost read it. But discover what you'll see if you go to any page like the following, newspaper site which is in Unicode Singhala.
A Singhala newspaper web site

Now understand that this is phonetic writing. That means taday there is technology to record speech and instantly transmit it and convert it into text! (Who would want to do such a thing?).

All Indian languages have this same behavior. Singhala is perhaps the most complex of all of them where the orthography is concerned.

Syndicate content Syndicate content