Getting a string of all defined characters?

mummla
15.May.2008 10.03am
mummla's picture

Hi all,

I’ve been looking for an easy, fast, automatic way to get all defined characters of a font, preferably just as a unicode string. Does anyone here know a trick?

Cheers,

Nick



Chipman223
15.May.2008 10.38am
Chipman223's picture

I’m not sure if this is what you’re asking, but in illustrator and indesign, you just have to open the glyphs pallette by going to TYPE>GLYPHS.

You can click each one to get it’s unicode, or double-click it to put it into a selected text field.

Also, Apple’s TextEdit has a similar window EDIT>SPECIAL CHARACTERS (cmnd+opt+t). This will actually give you a lot more information than illustrator or indesign do.

I’m sure there’s a way to do it in word, but I never use it.


mummla
15.May.2008 11.07am
mummla's picture

Hi,

Thanks for your answer but it was not really what I was looking for - to clarify I need a string of all characters that are defined in a particular font. For example:

!”#$%&’()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_‘abcdefghijklmnopqrstuvwxyz
{|}~¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõ
ö÷øùúûüýþÿĆćČčđĞğİıŁłŒœŞşŠšŸŽžƒˆ˜–—‘’‚“”„†‡•…‰‹›€™

Maintype for example gives me a good visual overview of all the characters but I have yet to find a way to automatically create a string like the above...


j.hadley
15.May.2008 11.43am
j.hadley's picture

I don’t think it will be easy to do this without some programming, or a repetitive motion injury-inducing amount of mouse clicking.

You basically need to loop through the entries in the cmap and produce a string corresponding to each character code, appending to an output string. It seems like it should be possible to do something like this in FontLab using Python. As I am fairly non-Pythonic I cannot offer any specific advice (I achieve the same thing using tools written in another programming language instead), but I can suggest this non-language-specific pseudo-code:

     for each characterCode in fontUnicodeArray:
          set myString to myString + unichr(characterCode)
     next

     print myString

“unichr” is a function that converts a number (i.e. character code value) into a Unicode string. I believe Python has a something like that ;-) Maybe you or someone else can fill in the other bits in proper Python (and FontLab) form and get what you’re after...


Gus Winterbottom
15.May.2008 2.28pm
Gus Winterbottom's picture

If you use Windows, there’s a program called JFont that claims it can print a character map report to an RTF file.


Theunis de Jong
15.May.2008 4.00pm
Theunis de Jong's picture

The character map sample of Algerian on that JFont site looks a bit suspicious, mixing hex and decimal ...

It depends on what tools you have at hand. Python and FontLab sure sounds like a feasible combination. Maybe the Adobe FDK (also through Python) — there’s bound to be something useful in that.


Michel Boyer
15.May.2008 5.52pm
Michel Boyer's picture

If you are not afraid of line commands, here is a script that works with FontForge (no need to install X-Windows; works on Mac or Pc with Cygwin and Python):

#!/usr/bin/python

import fontforge,sys
fnt=fontforge.open(sys.argv[1])

validranges = range(0x20,0x500)+range(0x1E00,0x2700)+range(0xFB00, 0xFB50)

s=''
for g in fnt.glyphs():
  if (g.unicode in validranges):
    s=s+unichr(g.unicode)
print s.encode('utf-8')

If you call that script “listchars” then

listchars 2>/dev/null font_file

will give you the string on the output. You can also output in a file with printchars 2>/dev/null font_file > string.txt.

If you don’t don’t like line commands, and if you are on a mac, you can also use this little application which is just the above wrapped in a clickable thing. It should ask you to install FontForge if you do not have it (it takes seconds); then you can select your font file and you get the string in a window.

Michel


mummla
16.May.2008 10.21am
mummla's picture

Thanks for all your answers!

When investing jFont I came across a nifty windows utility called BabelMap that has a font analysis function that does this and more - the actual output is made in a non-selectable textfield though. I wrote the developer about this.

For now, I settled for the fontforge solution, even though it meant getting my feet wet with cygwin and python. Works great. Thanks for that Michel.


Michel Boyer
16.May.2008 11.01am
Michel Boyer's picture

Welcome! By the way, the above script should translate easily to one for FontLab; I don’t use FontLab but that looks obvious from what I could see in Haralambous’ book “Fonts and Encodings”.

Michel


cuttlefish
17.May.2008 10.06pm
cuttlefish's picture

Michel, your link to the little application appears to be broken


Michel Boyer
18.May.2008 5.57am
Michel Boyer's picture

Michel, your link to the little application appears to be broken

Link unbroken. Weird. I am sure I had checked it.


Michel Boyer
18.May.2008 7.10am
Michel Boyer's picture

The ranges above miss Hebrew and Arabic characters. If you code yourself, you must add them. I just modified the little application; instead of using ranges, I generated a list of all the characters from 0020 to 00FF that are not control and that are listed in http://www.unicode.org/Public/UNIDATA/NamesList.txt; now Hebrew and Arabic characters should be there (and everything else defined from 0020 to 00FF).

Michel


Michel Boyer
18.May.2008 7.30am
Michel Boyer's picture

s/00FF/FFFF/ I meant FFFF and not 00FF of course (15 minutes over, no way to correct).


Michel Boyer
18.May.2008 10.20am
Michel Boyer's picture

Hmm. typophile included the semicolon in the link to the NamesList, which is thus also broken. This links works

http://www.unicode.org/Public/UNIDATA/NamesList.txt


Michel Boyer
19.May.2008 7.21am
Michel Boyer's picture

For the record, here is the my final script. You may jazz it as you want.

I finally remembered that with the Python unicodedata module one can check for the name of a unicode character (but with the narrow build of Python on the Mac, one cannot process that way characters above 0xFFFF). That can be quite handy. Here is an example:

>>> from unicodedata import *
>>> name(unichr(0x05D0)); name(unichr(0x0627)); name(unichr(0x0905))
'HEBREW LETTER ALEF'
'ARABIC LETTER ALEF'
'DEVANAGARI LETTER A'
>>>

The script thus simply outputs the unicode characters in the font whose code is in the range 0x00200xFFFF and that have a name in the unicode namelist (according to the unicodedata function “name”). Here it is:

----
#!/usr/bin/python
import fontforge,sys
from unicodedata import *
fnt=fontforge.open(sys.argv[1],1)

s=''
glyphset=fnt.glyphs()
for g in glyphset:
   cdg=g.unicode
   if (0x20 <= cdg <= 0xFFFF):
      uni=unichr(cdg)
      if (name(uni,"noname") != "noname"):
         s=s+uni
print s.encode('utf-8')
----

Michel


twardoch
21.May.2008 6.07am
twardoch's picture

If you have FontLab Studio:

1. Open the font.
2. Choose Tools / Quick Test As / OpenType TT (.ttf)
3. In the Quick Test window choose Content / All Characters.
4. Copy and paste the contents of the window into your favorite text editor.

Note that only encoded glyphs (Unicode and PUA) are shown.

Adam