Vietnamese Glyphs

Primary tabs

17 posts / 0 new
Last post
Duong Nguyen's picture
Offline
Joined: 1 Aug 2013 - 9:33pm
Vietnamese Glyphs
0

Hi all,
I'm newbie so maybe this question is quite silly to you.

I'm designing a new font for my personal purpose. I've just finish basic latin characters and now moving to Vietnamese characters (or someting like: Western Europe, Central Europe...). How could I link to these all ones because the software just shows basic letters on keyboard. (I'm using Fontlab)
I had an idea to search their unicode code (on Wikipedia) for each but it took a lot of times to do so.

Thanks for your answer.

Duong.

Simon Daniels's picture
Offline
Joined: 11 Apr 2002 - 6:37pm
0

You could probably start by encoding these characters... http://en.wikipedia.org/wiki/Windows-1258

Cheers, Si

Duong Nguyen's picture
Offline
Joined: 1 Aug 2013 - 9:33pm
0

@Si_Daniels:
thanks for your replying.
May I show all characters of this list in Fontlab Studio, it is faster than finding each character and edit them one by one.

Michel Boyer's picture
Offline
Joined: 2 Jun 2007 - 1:01pm
0

You can get the pane for that codepage by clicking "page mode" at the bottom and then choose as follows:

If you want to add the Vietnamese characters in the Latin extensions, you choose another pane. You select "Ranges mode" and then "1E00 Latin Extended Additional". The characters that concern you are from 1EA0 to 1EF1.

The pictures are scaled. If you find them too small, open them in a new window or new tab.

Duong Nguyen's picture
Offline
Joined: 1 Aug 2013 - 9:33pm
0

Thank you for your great response, Michel. This is exactly what I'm finding. At first, I also chose MS Windows 1528 Vietnamese for those characters. But I was a little bit confused because there were some missing words that I couldn't see. (I'm Vietnamese :-)).
By the way, maybe I will expand my font list to characters in Western Europe (1252) or Central Europe (1250). So, could you also give me the way to access to these full character sets? Or just choose MS Windows 1250/1252 is enough because it contains full words. I thought it's not enough because in the link below it looks like more characters than words list in Fontlab.
http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)

James Montalbano's picture
Joined: 18 Jun 2003 - 11:00am
0

I would suggest you create the combined accents as individual combining glyphs. You'll have to scale and redraw the different elements of say a circumflexacute that it will be easier to place and hint. I recommend building separate accent combos for the lowercase and uppercase (and small caps).

Once all of the glyphs are built as components, just copy and paste from one font to the next.

Also, this site should be on your short list:

http://www.unicode.org/charts/

Michel Boyer's picture
Offline
Joined: 2 Jun 2007 - 1:01pm
0

Those charts are indeed great to get an idea of how the characters are expected to look like. For the other information the charts contain, I find the file NamesList.txt much more useful. For instance, to get the relevant characters for Vietnamese, just search for "Vietnamese" in NamesList.txt.

Better still, with just basic knowledge of Python, unicodedata (see ref on http://docs.python.org/2/library/unicodedata.html) gives you a fast access to what I find relevant information in http://www.unicode.org/Public/UNIDATA/

I don't know how much that can be useful for font design but here is for instance a script that finds the NFD canonical decomposition of characters in a range specified by two hex numbers and prints all the component characters; no need to search the files on the unicode site:

---- file decomp ---- cut here
#!/usr/bin/env python

import unicodedata, sys
ud=unicodedata

if len(sys.argv) < 2:
print """Usage: %s starthex endhex
Example: %s 1EA0 1EF1 """ %(sys.argv[0],sys.argv[0])
exit()

start=int(sys.argv[1],16)
end=int(sys.argv[2],16)

def uhexandname(h):
try:
nam=ud.name(unichr(h))
except:
nam=''
return "u%04X %s" % (h, nam)

schars=set([])
for h in range(start,end+1):
schars |= {ord(c) for c in ud.normalize('NFD',unichr(h))}

lchars=list(schars); lchars.sort()
for c in lchars:
print uhexandname(c)
---- cut here ---

Here is a trace of execution, showing all components of character in the range 1EA0 -- 1EF1 (included):

611 % decomp 1EA0 1EF1
u0041 LATIN CAPITAL LETTER A
u0045 LATIN CAPITAL LETTER E
u0049 LATIN CAPITAL LETTER I
u004F LATIN CAPITAL LETTER O
u0055 LATIN CAPITAL LETTER U
u0061 LATIN SMALL LETTER A
u0065 LATIN SMALL LETTER E
u0069 LATIN SMALL LETTER I
u006F LATIN SMALL LETTER O
u0075 LATIN SMALL LETTER U
u0300 COMBINING GRAVE ACCENT
u0301 COMBINING ACUTE ACCENT
u0302 COMBINING CIRCUMFLEX ACCENT
u0303 COMBINING TILDE
u0306 COMBINING BREVE
u0309 COMBINING HOOK ABOVE
u031B COMBINING HORN
u0323 COMBINING DOT BELOW

That was tested with python2.7.2.

Dr. Ken Lunde's picture
Offline
Joined: 17 Sep 2006 - 8:31pm
0

I am pretty sure that for Vietnamese you also need glyphs for 1EF2 through 1EF9.

John Hudson's picture
Offline
Joined: 21 Dec 2002 - 11:00am
0

Yes, that's correct.

Michel Boyer's picture
Offline
Joined: 2 Jun 2007 - 1:01pm
0

They are indeed in the table of the wiki http://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks, which shows that searching Vietnamese in Nameslist.txt is not enough. I presume that with such a complete list, you make a .enc file for Fontlab so as to see all the glyphs you need in one shot (plus at least the variants you need for the composed diacritics). Just curious, I don't use Fontlab.

If I do that with FontForge for Source Sans Pro, here is a possible view of the capitals (the script generating the encoding grouped the "base glyphs" together, "base glyph" being here the first character of the canonical decomposition.

(I never had so much trouble inserting an image...)

When I use the same encoding with FontLab (with the glyph names taken from Source Sans Pro), some characters look missing and are not placed where expected (Abreve is placed somewhere else and the corresponding uni character looks missing for instance).

Michel Boyer's picture
Offline
Joined: 2 Jun 2007 - 1:01pm
0

The following letters appear in the Wikipedia table. They do not figure in the Fontlab win_1258.enc file (at least those hex values do not appear in the comments).

00C3 LATIN CAPITAL LETTER A WITH TILDE
00CC LATIN CAPITAL LETTER I WITH GRAVE
00D2 LATIN CAPITAL LETTER O WITH GRAVE
00D5 LATIN CAPITAL LETTER O WITH TILDE
00DD LATIN CAPITAL LETTER Y WITH ACUTE
00E3 LATIN SMALL LETTER A WITH TILDE
00EC LATIN SMALL LETTER I WITH GRAVE
00F2 LATIN SMALL LETTER O WITH GRAVE
00F5 LATIN SMALL LETTER O WITH TILDE
00FD LATIN SMALL LETTER Y WITH ACUTE
0128 LATIN CAPITAL LETTER I WITH TILDE
0129 LATIN SMALL LETTER I WITH TILDE
0168 LATIN CAPITAL LETTER U WITH TILDE
0169 LATIN SMALL LETTER U WITH TILDE

Are they also required? Is there no clear and reliable list?

Albert-Jan Pool's picture
Offline
Joined: 30 Aug 2006 - 2:14am
0

Or just choose MS Windows 1250/1252 is enough because it contains full words.

could it be that you are confusing ‘words’ with ‘names’?

Michel Boyer's picture
Offline
Joined: 2 Jun 2007 - 1:01pm
0

I downloaded the small Hunspell Vietnamese spellchecker and looked at the characters used. Aside from the 1EA0 to 1EF9 range and the standard unaccented latin letters, it uses the following small letters

00E0 00E1 00E2 00E3 00E8 00E9 00EA 00EC 00ED
00F2 00F3 00F4 00F5 00F9 00FA 00FD 0103 0111
0129 0169 01A1 01B0

The corresponding capitals should also be needed

00C0 00C1 00C2 00C3 00C8 00C9 00CA 00CC 00CD
00D2 00D3 00D4 00D5 00D9 00DA 00DD 0102 0110
0128 0168 01A0 01AF

That implies that the small letters that are in my list of the post http://typophile.com/node/105171#comment-562024 (thus neither in Windows 1258 nor in the 1EA0-1EF9 range) figure all in that small dictionary of only 6631 entries.

Dr. Ken Lunde's picture
Offline
Joined: 17 Sep 2006 - 8:31pm
0

Besides ASCII and friends (aka, ISO 8859-1 or U+00[A-F][0-9A-F]), your listing above covers all of the characters that requires glyphs for full Vietnamese support.

Michel Boyer's picture
Offline
Joined: 2 Jun 2007 - 1:01pm
0

Or just choose MS Windows 1250/1252 is enough

Put together, they are still missing the characters

0128 LATIN CAPITAL LETTER I WITH TILDE
0129 LATIN SMALL LETTER I WITH TILDE
0168 LATIN CAPITAL LETTER U WITH TILDE
0169 LATIN SMALL LETTER U WITH TILDE
01A0 LATIN CAPITAL LETTER O WITH HORN
01A1 LATIN SMALL LETTER O WITH HORN
01AF LATIN CAPITAL LETTER U WITH HORN
01B0 LATIN SMALL LETTER U WITH HORN

on top of all those in the 1EA0-1EF9 range.

Michel Boyer's picture
Offline
Joined: 2 Jun 2007 - 1:01pm
0

I just ran the following experiment: I typed ằẳẵắặ with the Vietnamese Keyboard http://gate2home.com/Vietnamese-Keyboard, copied the characters in the little box (I was using Chrome on OS X 10.8) and pasted them in vi (and TextEdit); the sequence of characters pasted was

0103 LATIN SMALL LETTER A WITH BREVE
0300 COMBINING GRAVE ACCENT
0103 LATIN SMALL LETTER A WITH BREVE
0309 COMBINING HOOK ABOVE
0103 LATIN SMALL LETTER A WITH BREVE
0303 COMBINING TILDE
0103 LATIN SMALL LETTER A WITH BREVE
0301 COMBINING ACUTE ACCENT
0103 LATIN SMALL LETTER A WITH BREVE
0323 COMBINING DOT BELOW

(from a dump of the utf8 text file). Now, if I copy those characters with option C in vi and paste them with option V (either in vi or textedit), the letters that are pasted are

1EB1 LATIN SMALL LETTER A WITH BREVE AND GRAVE
1EB3 LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE
1EB5 LATIN SMALL LETTER A WITH BREVE AND TILDE
1EAF LATIN SMALL LETTER A WITH BREVE AND ACUTE
1EB7 LATIN SMALL LETTER A WITH BREVE AND DOT BELOW

During the copy-paste the string is recoded. That is a behaviour I did not expect. Is that something that is documented and, if so, where?

(In fact, this text was written in textedit, and pasted with Chrome in the typophile edit window and the recoding also appears to have occurred on the first line but this may come this time from some Unicode normalization rule for data interchange. Nevertheless, with the link /files/clavierviet.html, no recoding seems to occur)

(Added: If I view /files/clavierviet.html with Safari, copy the string and paste it, the combining diacritics are kept as with Chrome. If I do the same with Firefox, the recoding occurs, independently of the font used for viewing, even with a font with no ccmp table. Note that I am now on OS X 10.6.8 with Firefox 21.0)

Michel Boyer's picture
Offline
Joined: 2 Jun 2007 - 1:01pm
0

Maybe I should add, to clarify, that the Vietnamese keyboard (at least on the Mac) does not behave like the keyboard on the site I referred to above, http://gate2home.com/Vietnamese-Keyboard; indeed the orange keys behave like "dead keys" and after the accent is typed, a unique precomposed character is input in the text.

(open image in new window or new tab to see actual size).