I’ve been looking for an easy, fast, automatic way to get all defined characters of a font, preferably just as a unicode string. Does anyone here know a trick?
I’m not sure if this is what you’re asking, but in illustrator and indesign, you just have to open the glyphs pallette by going to TYPE>GLYPHS.
You can click each one to get it’s unicode, or double-click it to put it into a selected text field.
Also, Apple’s TextEdit has a similar window EDIT>SPECIAL CHARACTERS (cmnd+opt+t). This will actually give you a lot more information than illustrator or indesign do.
I’m sure there’s a way to do it in word, but I never use it.
Thanks for your answer but it was not really what I was looking for - to clarify I need a string of all characters that are defined in a particular font. For example:
Maintype for example gives me a good visual overview of all the characters but I have yet to find a way to automatically create a string like the above...
I don’t think it will be easy to do this without some programming, or a repetitive motion injury-inducing amount of mouse clicking.
You basically need to loop through the entries in the cmap and produce a string corresponding to each character code, appending to an output string. It seems like it should be possible to do something like this in FontLab using Python. As I am fairly non-Pythonic I cannot offer any specific advice (I achieve the same thing using tools written in another programming language instead), but I can suggest this non-language-specific pseudo-code:
for each characterCode in fontUnicodeArray:
set myString to myString + unichr(characterCode)
next
print myString
“unichr” is a function that converts a number (i.e. character code value) into a Unicode string. I believe Python has a something like that ;-) Maybe you or someone else can fill in the other bits in proper Python (and FontLab) form and get what you’re after...
The character map sample of Algerian on that JFont site looks a bit suspicious, mixing hex and decimal ...
It depends on what tools you have at hand. Python and FontLab sure sounds like a feasible combination. Maybe the Adobe FDK (also through Python) — there’s bound to be something useful in that.
If you are not afraid of line commands, here is a script that works with FontForge (no need to install X-Windows; works on Mac or Pc with Cygwin and Python):
#!/usr/bin/python
s=''
for g in fnt.glyphs():
if (g.unicode in validranges):
s=s+unichr(g.unicode)
print s.encode('utf-8')
If you call that script “listchars” then
listchars 2>/dev/null font_file
will give you the string on the output. You can also output in a file with printchars 2>/dev/null font_file > string.txt.
If you don’t don’t like line commands, and if you are on a mac, you can also use this little application which is just the above wrapped in a clickable thing. It should ask you to install FontForge if you do not have it (it takes seconds); then you can select your font file and you get the string in a window.
When investing jFont I came across a nifty windows utility called BabelMap that has a font analysis function that does this and more - the actual output is made in a non-selectable textfield though. I wrote the developer about this.
For now, I settled for the fontforge solution, even though it meant getting my feet wet with cygwin and python. Works great. Thanks for that Michel.
Welcome! By the way, the above script should translate easily to one for FontLab; I don’t use FontLab but that looks obvious from what I could see in Haralambous’ book “Fonts and Encodings”.
The ranges above miss Hebrew and Arabic characters. If you code yourself, you must add them. I just modified the little application; instead of using ranges, I generated a list of all the characters from 0020 to 00FF that are not control and that are listed in http://www.unicode.org/Public/UNIDATA/NamesList.txt; now Hebrew and Arabic characters should be there (and everything else defined from 0020 to 00FF).
For the record, here is the my final script. You may jazz it as you want.
I finally remembered that with the Python unicodedata module one can check for the name of a unicode character (but with the narrow build of Python on the Mac, one cannot process that way characters above 0xFFFF). That can be quite handy. Here is an example:
>>> from unicodedata import *
>>> name(unichr(0x05D0)); name(unichr(0x0627)); name(unichr(0x0905))
'HEBREW LETTER ALEF'
'ARABIC LETTER ALEF'
'DEVANAGARI LETTER A'
>>>
The script thus simply outputs the unicode characters in the font whose code is in the range 0x0020 — 0xFFFF and that have a name in the unicode namelist (according to the unicodedata function “name”). Here it is:
----
#!/usr/bin/python
import fontforge,sys
from unicodedata import *
fnt=fontforge.open(sys.argv[1],1)
s=''
glyphset=fnt.glyphs()
for g in glyphset:
cdg=g.unicode
if (0x20 <= cdg <= 0xFFFF):
uni=unichr(cdg)
if (name(uni,"noname") != "noname"):
s=s+uni
print s.encode('utf-8')
----
1. Open the font.
2. Choose Tools / Quick Test As / OpenType TT (.ttf)
3. In the Quick Test window choose Content / All Characters.
4. Copy and paste the contents of the window into your favorite text editor.
Note that only encoded glyphs (Unicode and PUA) are shown.
Adam
Who's Online:
There are currently 27 users and 132 guests online.
User login
New to Typophile? Accounts are free, and easy to set up.
15.May.2008 10.38am
I’m not sure if this is what you’re asking, but in illustrator and indesign, you just have to open the glyphs pallette by going to TYPE>GLYPHS.
You can click each one to get it’s unicode, or double-click it to put it into a selected text field.
Also, Apple’s TextEdit has a similar window EDIT>SPECIAL CHARACTERS (cmnd+opt+t). This will actually give you a lot more information than illustrator or indesign do.
I’m sure there’s a way to do it in word, but I never use it.
15.May.2008 11.07am
Hi,
Thanks for your answer but it was not really what I was looking for - to clarify I need a string of all characters that are defined in a particular font. For example:
!”#$%&’()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_‘abcdefghijklmnopqrstuvwxyz
{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõ
ö÷øùúûüýþÿĆćČčđĞğİıŁłŒœŞşŠšŸŽžƒˆ˜–—‘’‚“”„†‡•…‰‹›€™
Maintype for example gives me a good visual overview of all the characters but I have yet to find a way to automatically create a string like the above...
15.May.2008 11.43am
I don’t think it will be easy to do this without some programming, or a repetitive motion injury-inducing amount of mouse clicking.
You basically need to loop through the entries in the cmap and produce a string corresponding to each character code, appending to an output string. It seems like it should be possible to do something like this in FontLab using Python. As I am fairly non-Pythonic I cannot offer any specific advice (I achieve the same thing using tools written in another programming language instead), but I can suggest this non-language-specific pseudo-code:
for each characterCode in fontUnicodeArray:
set myString to myString + unichr(characterCode)
next
print myString
“unichr” is a function that converts a number (i.e. character code value) into a Unicode string. I believe Python has a something like that ;-) Maybe you or someone else can fill in the other bits in proper Python (and FontLab) form and get what you’re after...
15.May.2008 2.28pm
If you use Windows, there’s a program called JFont that claims it can print a character map report to an RTF file.
15.May.2008 4.00pm
The character map sample of Algerian on that JFont site looks a bit suspicious, mixing hex and decimal ...
It depends on what tools you have at hand. Python and FontLab sure sounds like a feasible combination. Maybe the Adobe FDK (also through Python) — there’s bound to be something useful in that.
15.May.2008 5.52pm
If you are not afraid of line commands, here is a script that works with FontForge (no need to install X-Windows; works on Mac or Pc with Cygwin and Python):
#!/usr/bin/python
import fontforge,sys
fnt=fontforge.open(sys.argv[1])
validranges = range(0x20,0x500)+range(0x1E00,0x2700)+range(0xFB00, 0xFB50)
s=''
for g in fnt.glyphs():
if (g.unicode in validranges):
s=s+unichr(g.unicode)
print s.encode('utf-8')
If you call that script “listchars” then
listchars 2>/dev/null font_file
will give you the string on the output. You can also output in a file with
printchars 2>/dev/null font_file > string.txt.If you don’t don’t like line commands, and if you are on a mac, you can also use this little application which is just the above wrapped in a clickable thing. It should ask you to install FontForge if you do not have it (it takes seconds); then you can select your font file and you get the string in a window.
Michel
16.May.2008 10.21am
Thanks for all your answers!
When investing jFont I came across a nifty windows utility called BabelMap that has a font analysis function that does this and more - the actual output is made in a non-selectable textfield though. I wrote the developer about this.
For now, I settled for the fontforge solution, even though it meant getting my feet wet with cygwin and python. Works great. Thanks for that Michel.
16.May.2008 11.01am
Welcome! By the way, the above script should translate easily to one for FontLab; I don’t use FontLab but that looks obvious from what I could see in Haralambous’ book “Fonts and Encodings”.
Michel
17.May.2008 10.06pm
Michel, your link to the little application appears to be broken
18.May.2008 5.57am
Michel, your link to the little application appears to be broken
Link unbroken. Weird. I am sure I had checked it.
18.May.2008 7.10am
The ranges above miss Hebrew and Arabic characters. If you code yourself, you must add them. I just modified the little application; instead of using ranges, I generated a list of all the characters from 0020 to 00FF that are not control and that are listed in http://www.unicode.org/Public/UNIDATA/NamesList.txt; now Hebrew and Arabic characters should be there (and everything else defined from 0020 to 00FF).
Michel
18.May.2008 7.30am
s/00FF/FFFF/ I meant FFFF and not 00FF of course (15 minutes over, no way to correct).
18.May.2008 10.20am
Hmm. typophile included the semicolon in the link to the NamesList, which is thus also broken. This links works
http://www.unicode.org/Public/UNIDATA/NamesList.txt
19.May.2008 7.21am
For the record, here is the my final script. You may jazz it as you want.
I finally remembered that with the Python
unicodedatamodule one can check for the name of a unicode character (but with the narrow build of Python on the Mac, one cannot process that way characters above 0xFFFF). That can be quite handy. Here is an example:>>> from unicodedata import *
>>> name(unichr(0x05D0)); name(unichr(0x0627)); name(unichr(0x0905))
'HEBREW LETTER ALEF'
'ARABIC LETTER ALEF'
'DEVANAGARI LETTER A'
>>>
The script thus simply outputs the unicode characters in the font whose code is in the range
0x0020—0xFFFFand that have a name in the unicode namelist (according to theunicodedatafunction “name”). Here it is:----
#!/usr/bin/python
import fontforge,sys
from unicodedata import *
fnt=fontforge.open(sys.argv[1],1)
s=''
glyphset=fnt.glyphs()
for g in glyphset:
cdg=g.unicode
if (0x20 <= cdg <= 0xFFFF):
uni=unichr(cdg)
if (name(uni,"noname") != "noname"):
s=s+uni
print s.encode('utf-8')
----
Michel
21.May.2008 6.07am
If you have FontLab Studio:
1. Open the font.
2. Choose Tools / Quick Test As / OpenType TT (.ttf)
3. In the Quick Test window choose Content / All Characters.
4. Copy and paste the contents of the window into your favorite text editor.
Note that only encoded glyphs (Unicode and PUA) are shown.
Adam