TrueType outlines as ASCII text

Grzegorz Rolek's picture

Does anyone know of a way of storing TrueType outlines as human-readable ASCII text that could be easily translated to and from a 'glyf' table bytecode? Similar to how PostScript Type 1 code can be assembled/disassembled with Type 1 Utilities or Adobe's FDK. There's of course TTX or GLIF, but I myself wouldn't count that, as they don’t read nor play well in standard Unix toolchain and versioning systems.

Karl Stange's picture

Can you offer a more detailed example of your requirements?

Khaled Hosny's picture

If you want 100% round trip conversion, I don’t think there is any (certainly not TTX, which can corrupt some OpenType tables badly).

Grzegorz Rolek's picture

I haven't dug into sfntly before, but as I'm reading briefly through the source, it looks like its dumps are only informative and don't print the actual glyph data.

Karl, it basically would be as simple as that; a plaintext description of data contained in the 'glyf' table that could be compiled and fused into a font binary directly. This would allow for storing and versioning TrueType outlines in a manner similar to PostScript or MetaFont sources. But as Khaled said, there may be no such thing available today, not publicly anyway.

Karl Stange's picture

Are you experimenting with parametric design?

Grzegorz Rolek's picture

No, I would simply like to have some reasonable storage for native TrueType designs. Keeping all in binary is not very practical. But having more options for working on TrueType curves would be much appreciated indeed. It appears that it becomes more and more undervalued these days.

Karl Stange's picture

It appears that it becomes more and more undervalued these days.

Judging by some recent discussions I have seen on the matter I think most people would like to see an easier way of working natively with quadratic beziers for drawing, and a more straightforward approach to working natively with TrueType would benefit everyone working with fonts. There still seems to be a significant amount of misunderstanding and even suspicion about the TrueType format even in the type community, which is worrying.

dberlow's picture

UFO and RoboFont ended this kind of discussion, I thought?

Glyphs are stored as text in a UFO and native quad editing is available for no extra charge in RoboFont.

Karl Stange's picture

UFO and RoboFont ended this kind of discussion, I thought?

Entirely possible that I am looking at this the wrong way but below are the text results of the same glyph data as taken from a TrueType file decomposed in TTX and then the same taken from the UFO file [EDIT] (generated with FontForge as I do not currently have access to RoboFont):

TTX:
<TTGlyph name="A" xMin="50" yMin="0" xMax="1698" yMax="1648">
<contour>
<pt x="50" y="1648" on="1"/>
<pt x="1698" y="1648" on="1"/>
<pt x="1698" y="0" on="1"/>
<pt x="50" y="0" on="1"/>
</contour>
<instructions><assembly>
</assembly></instructions>
</TTGlyph>

UFO:

<glyph name="A" format="1">
<advance width="1748"/>
<unicode hex="0041"/>
<outline>
<contour>
<point x="50" y="1648" type="line" smooth="no"/>
<point x="1698" y="1648" type="line" smooth="no"/>
<point x="1698" y="0" type="line" smooth="no"/>
<point x="50" y="0" type="line" smooth="no"/>
</contour>
</outline>
</glyph>

Grzegorz, is the second set of data any easier to parse?

Karl Stange's picture

Also, have you experimented with Microsoft's TTFdump tool?

Grzegorz Rolek's picture

Both TTX and UFO would be same easy or same hard to parse if you like. I just wouldn’t consider UFO a mature format, particularly not a format that would be closely tied to any of the target technologies in the industry, i.e., either PostScript or TrueType. It’s the lowest common denominator between the two and while this can be useful in an early work environment, it makes it bad for fine-tuning and storage.

In fact, TTX would be more adequate for the job, but it’s still a bulky XML/Python combo and the tool itself appears not to be maintained any more. I haven’t had any access to Microsoft’s software lately, so I can’t tell nothing about TTFDump.

Karl Stange's picture

I haven’t had any access to Microsoft’s software lately, so I can’t tell nothing about TTFDump.

There appears to be a Linux distribution as well, either independent of or tied to TeX but I have not been able to get that or the Windows version working.

Khaled Hosny's picture

TeX Live’s ttfdump, is a different utility not related to Microsoft one, and it only dumps TTf files (the tables it understands), but can not compile them back.

Karl Stange's picture

TeX Live’s ttfdump, is a different utility not related to Microsoft one, and it only dumps TTf files (the tables it understands), but can not compile them back.

Thank you, Khaled. The Google searches were getting increasingly confusing!

Theunis de Jong's picture

Grzegorz, in essence TrueType font outlines are points, connected with lines. It's hard to describe that in any "easy readable" way.

Can you make up a sample of what sort of output you'd like to see?
Keep in mind there is a limit to "human readabilty" if you want it to be able to be compiled back into a font as well.

ahyangyi's picture

I think it's worth mentioning that FontForge's format, ".sfd", is plain-text. And FontForge's author also wrote some tools to do diff and merge for .sfd files. Even without the extra utilities, it goes well with Git and diff and is fairly human readable.

However, I don't think converting a font to .sfd and back will generate exactly the same thing...

dberlow's picture

"I... wouldn’t consider UFO a mature format... not a format that would be closely tied to any of the target technologies in the industry... the lowest common denominator between PostScript or TrueType... bad for fine-tuning and storage."

It's upside-down day. ;)

Grzegorz Rolek's picture

In essence TrueType font outlines are points, connected with lines. It's hard to describe that in any "easy readable" way.

The thing is, the way those points are stored in the glyf table is full of techniques for making it spectacularly space efficient. It’s those tricks that could be translated without a loss into some concise syntax, and be reassembled byte-for-byte.

Grzegorz Rolek's picture

I’ll be doing some prototyping in the near future of a simple assembler/disassembler to test some example syntax I’m thinking of.

malcolm's picture

In fact, TTX would be more adequate for the job, but it’s still a bulky XML/Python combo and the tool itself appears not to be maintained any more.

The UFO format (which is a close relation of TTX) is ideal for storing font data in a readable form, and converting back to binary.

Having a readable form of binary data means a hefty overhead in size compared to the original binary form, and a relatively high resource demand when trying to interpret the readable form.

John Hudson's picture

Yes, which is the main reason that I'm not crazy about UFO as a working format. For a small Latin font, it isn't so bad, but for a large multiscript font the file size and the resource demands are unattractive. As an archive format, however, UFO is a great idea.

Grzegorz Rolek's picture

The UFO format […] is ideal for storing font data in a readable form, and converting back to binary.

Malcolm, I have to disagree. It’s not an ideal archive nor conversion format. It’s pretty primitive in what it can represent and how, thus it’s bulky, and it loses much of the production details of both PostScript/CFF and TrueType binaries. For now, lack of any kind of hinting representation is probably the most notorious example.

I think the unbeatable format for developing or archiving fonts with PostScript outlines is a raw PostScript Type 1 code. Type 1 is an acknowledged and stable technology in the industry, with lots of freely available utilities such as the good old Type 1 Utilities. It’s also a format that has code-like qualities: it’s plain text that is concise and can be edited by hand, versioned and manipulated with standard Unix tools and filters. Furthermore, it translates practically losslessly into the CFF in an OpenType binary.

Real-life example: I have a bunch of PFAs I want to merge with Adobe's mergeFonts utility and bake a final OTF with makeotf. Before that, I have to wipe out any encodings in the PFAs to avoid conflicts on merge. So I disassemble them into raw Type 1 code with t1disasm, pipe it through sed with a simple regex, and assemble back into PFAs with t1asm, all in a single command:

t1disasm enc.pfa | sed '/^dup [0-9][0-9]* \/..* put$/d' | t1asm --pfa > notdef.pfa

This is the simplest case I could think of now; please see, for example, the build script of my Kontrapunkt Pro project at https://github.com/grzegorzrolek/kontrapunkt-pro/ for a much broader illustration.

Now, TrueType outlines have no representation like that. Few people have mentioned TTX, but it’s still an XML that’s bulky by itself and that’s hard to parse or edit, either by hand or any standard Unix utility. What’s more, it doesn’t represent TrueType’s binary data close enough to make it a one-to-one conversion format.

Grzegorz Rolek's picture

It just occurred to me that the disassembly in the example above is actually redundant, as the encoding in Type 1 fonts is obviously in the plain text part of the file. Still, the whole idea applies.

malcolm's picture

What’s more, it doesn’t represent TrueType’s binary data close enough to make it a one-to-one conversion format.

Showing TrueType bytecode in a reasonably understandable 'Human Readable' form is difficult.

There are some programs that can compile text-like ascii into bytecode, but I don't know any that can decompile the bytecode into a reasonably understandable text-like syntax that can be easily edited by hand and then recompiled back to bytecode. Perhaps others here know of such a tool.

Font developers will often use two or three different programs to create their source data (one for outlines, another for hinting and another for OT Features, perhaps another for kerning...). It's down to careful management to keep the sources aligned so they can be easily recovered for further work.

dberlow's picture

"Perhaps others here know of such a tool."

Yes. But this turns out to be something quite extraordinary to develop. The combination of reading and writing all of the tables of OT fonts, with all the parsing, UIs and compiling required for the multitude of possible human interactions, formats and user-driven derivatives, has historically been... well, read this thread. Guy posts a query on the best way to store TT outlines, and with little or no encouragement, ends up satisfied that type 1 fonts are nirvana. I've read it 100 times and such incomplete thinking has been thoroughly proven.

Té Rowan's picture

It may be that the best best way is to use both SFD and TTX formats, both of which are text. SFD is FontForge's native format, but since FF tends to drop tables it doesn't understand, saving these as TTX should help. Mind, the chances that the recovered font will be exactly like the original are exactly three: Fat, slim and none.

John Hudson's picture

It seems pretty obvious to me that the ideal archive format -- and probably the ideal source format* -- for a TrueType font is ... a TrueType font. Now, obviously, this is the answer to a different question than 'What is the ideal archive format for a digital typeface design?', so we should be clear what we're asking about, and why.

I'm still haunted by the memory of standing in the non-Latin archive room at Linotype when it was being prepared for transfer to the University of Reading, and looking at the shelves of archive tape media of early digital type, realising that there were no machines that could read the tapes. [At Reading, Gerry Leonidas was able to track down a company that still makes the special filing cabinets to hold the large hanging files used for Linotype's paper drawings; I think the tapes were junked.]

* The sfnt table structure seems to me such an obviously good source format. It enables you to store most data natively in the target font format, and if you want to store independent source data, e.g. visual hints, you can always include private tables that can be read by font tools. This is how VTT and VOLT already work: the .ttf is the source file, and the tool data is stored in private tables until the font is 'shipped'. The Microsoft math font tool goes one better, and simply writes and reads the MATH table directly, which seems a good option for any tool that doesn't need to present the user with data that isn't written to the final font tables.

John Hudson's picture

[I realise that my last post is a bit of an aside from the main conversation, since Grzegorz isn't looking for an archive or even a source format but what might be termed a transformation format. Sometimes I just feel like saying things, though.]

Grzegorz Rolek's picture

Grzegorz isn’t looking for an archive or even a source format but what might be termed a transformation format.

You have one, you have the others. The purpose is the same: a textual translation that is fully reversible and lossless. Something that Eric S. Raymond in The Art of Unix Programming calls a textualization:

[Textualizers] degrade the [binary] representation as little as possible — in fact, they translate it reversibly and losslessly. This property is very important, and worth implementing even if there is no obvious application demand for that kind of 100% fidelity. It gives potential users confidence that they can experiment without degrading their data.

It also gives enough flexibility and precision to make fonts that push the technology to its extremes, something that just can’t be done with regular font editors, which make many assumptions behind your back, which make cross-format generalizations, et cetera.

I once read on FontForge’s mailing list George Williams himself saying that TrueType binary is indeed a good, if not best, format for storing the TrueType font, exactly because all the data is in there. But still, you can’t edit the binary without an editor while staying completely sane, and by using the editor you lose the flexibility, you lose the precision and the confidence.

Grzegorz Rolek's picture

Here are two illustrations why TrueType outline’s binary data would make a distinct textual representation, and how it could be useful.

First. In the binary, each coordinate have corresponding bit flags telling about its properties. Particular combination of those flags indicate, for example, that a coordinate is stored as a single byte (a short version), or as two bytes (long). In an ideal situation all coordinates that could be stored as single bytes would be stored as so, and the rest would be stored as two bytes. In that case, a textual disassembler could make those default assumptions and represent the point (an off-curve one) simply as:

point off x y

But some font editors or converters could make a lousy job of this optimization (as few system fonts on my machine I’ve tested happen to have it wrong with offsets in the compound glyphs). In that case the point would have a hint (or a cast if you prefer) and therefore look like this:

point off (x-long) x y

You could just leave it as such, and the assembler would force the coordinate to be stored as a long value in the output bytecode, thus making a lossless round-trip conversion back into the original binary as nothing ever happened. But you could also delete the "long" cast and allow the assembler to build it with the default behavior, thus giving you a properly optimized version of the font.

Second. Flags that are exactly the same for a given series of points are not repeated, but are indicated to be repeated a number of times to save space in the binary. This could be presented as:

point off x y, x′ y′, x″ y″

Now because in the binary, contour breaks are independent of both point and flag listings, this optimization can continue even with a contour boundary, giving a representation like so:

point off x y, end x′ y′, x″ y″

This would be different than points with flags that are same but made explicit for each point in the binary:

point off x y
point off end x′ y′
point off x″ y″

Formats like TTX or UFO store contours as separated lists of points, thus make the two scenarios indistinguishable and some of the binary data simply gets lost.

dberlow's picture

"I'm still haunted by the memory of standing in the non-Latin archive room at Linotype [...] looking at the shelves of archive tape media of early digital type, realising that there were no machines that could read the tapes"

:) But we still go to the archives of the drawings those tapes came from, which made it pretty obvious to me that the ideal archive format for a typeface style is not a TrueType or Type1 font, or the "lowest common denominator" or even the highest, but the richest and most attractive format to type developers of typeface families.

A fully hinted TT glyph can have all the information of a Linotype drawing. The representation of that, and the charts that accompanied the folios of drawings, can be held in private sfnt tables, but you don't want your source data the way sfnt's are: good for picking which tables to deal with this century, and which to deal with next century. Founders have complete and immediate needs, as we know.

This includes storing everything from the ideals of a typeface to the size instances of each glyph, and how they all (styles and glyphs) work together size dependently. Not any TrueType or Type1 fonts I know of can do that, so how to store all this sfntishly? (especially when one don't need to...)

chemoelectric's picture

Most of the discussion above seems to dodge the issue. The issue is having an assembly language representation of the table contents. I would suggest writing one if none exists and one wants it.

I would suggest it would be easiest to write such a thing as an s-expression language, the way Knuth did for tools like TFTOPL and PLTOTF, and to use Scheme or Common Lisp as the programming language, although s-expressions are easily handled in other languages. In Lisps there is no need to write a parser for s-expressions; you just read them in. (Expect to see Guile, a GNU Scheme implementation, gaining popularity as a programming language over coming years.)

chemoelectric's picture

One might also look at how cubic splines are written in Metafont for ideas about compact but human readable representations of quadratic splines expressed in glorious detail. You can specify control points explicitly and still have great readability.

abattis's picture

(Expect to see Guile, a GNU Scheme implementation, gaining popularity as a programming language over coming years.)

I'm curious what evidence you see pointing to this trend.

It seems pretty obvious to me that the ideal archive format ... for a TrueType font is... a TrueType font.

I agree.

-- and probably the ideal source format ... The sfnt table structure seems to me such an obviously good source format. It enables you to store most data natively in the target font format, and if you want to store independent source data, e.g. visual hints, you can always include private tables that can be read by font tools. This is how VTT and VOLT already work

I disagree; source ought not be binary.

John Hudson's picture

...source ought not be binary

What if the binary is better documented than existing source formats, has the advantage that it can be interpreted by multiple tools rather than being limited to one particular tool or programming language, can be translated to and from multiple transformation formats, and is designed in such a way that it can store any amount of non-compiled source data, including private data? In the case of things like VTT and VOLT source tables, we're really talking about source data that can travel with the binary, rather than a source/binary unification.

k.l.'s picture

@Dave: As regards the private tables, at least, VOLT data and afaik VTT data too are not binary but text. And reading/writing tables from sfnt structure fonts is trivial. As regards all others, the question is whether a) design/production data is meant to be condensed, with final/binary data to be extracted from it with help of additional mechanisms, or whether b) design/production data and final/binary data are essentially the same in terms of content and structure, demanding only light conversion when generating the latter out of the former. In case of a), a TTF is not a good design/production/archive format, containing too much and redundant data. In case of b), it is. And the way fonts are produced today basically reflects b), if alone because font editors work that way.

Té Rowan's picture

Unfortunately, today's version control systems are geared for text, not binary. Zaps (binary diffs) are a sorta lost art, I'm afraid.

blokland's picture

After all these years I’m very pleased with the IKARUS format for data storage and font generation still. And being there for almost 40 years, it has proved its resistance to the time. If everything goes well, the new OS X / Windows / Linux editor will support the storage of quadratic splines and hinting also. And it will support manual digitizing. Yippee!

FEB

twardoch's picture

I'll give my word of support for TTX. It's not free of flaws, and is XML (which Grzegorz seems to particularly dislike), but in my opinion is currently the by far closest to Grzegorz's aims.

I've been working with TTX for a long time now. You need to consider two aspects: there is the .ttx format, and there is the fontTools/TTX suite of tools, written in Python.

The .ttx format is, perhaps unfortunately, not documented by means other than the source code in Python. But this by itself may not be a big problem, because fontTools/TTX essentially defines a *transformation path* from the binary SFNT structure to human-readable XML, and back. And, after all, it is the binary SFNT structure that remains *canonical* in terms of the specification. The .ttx format is, and will remain, dependent on the binary structure — not the other way around.

Contrary to what Grzegorz suggests, fontTools/TTX is being maintained. The SFNT structure doesn't change much, because the OpenType font format doesn't change hugely. But whenever a new version of the OpenType format was published, fontTools/TTX was typically patched quite quickly. Adobe have been using fontTools/TTX extensively in AFDKO, and contributed a lot of code to fontTools/TTX. So in fact, if you use Adobe FDK for OpenType, you're using fontTools/TTX already anyway.

One great missing thing of relevance today is fontTools/TTX modules (and therefore the .ttx format) for the AAT tables. Apart from that, all is well supported.

There are a few SFNT tables where I use .ttx as my source format when I build fonts, "name" table being one of the most prominent.

Grzegorz, if you manage, perhaps, to overcome your dislike of Python and XML — I think you'd do the community a great service, should you decide to contribute some code to fontTools/TTX, or build something around it :)

Best,
Adam

twardoch's picture

Ps. When comparing .ufo with .ttx, one could say that .ufo requires "smart" tools while .ttx works with "dumb" tools. I.e. the transformation from .ufo to a final font requires a complex toolchain and there is no straightforward path which would guarantee that one .ufo would give you a particular .otf or .ttf. In other words, .ufo operates on an abstract level, and lots of information that has to go into .otf or .ttf is not included in .ufo, and therefore needs to be "invented" by the tool. On the other hand, .ttx operates on a lower level, and really is a very straightforward representation of .otf or .ttf. In other words, one could have a very different tool for compiling .ttx into .otf/.ttf, written in a different language -- but the results of the conversion would be the same. So the .ttx-to-.ttf process really is a straightforward combination, and it's a *stable* process. The .ufo-to-.ttf conversion is, well, something like going from HTML to PDF — lots of decisions need to happen in the software and those decisions are not "stable" in the sense of "not predetermined".

For example, the same .ufo will yield you very different .ttf files depending on what software you use to convert (FontForge, Glyphs, RoboFont, TransType 4). But that's the nature of .ufo, as it's intended to be high-level, and deliberately leaves various decisions to tools.

Grzegorz Rolek's picture

It’s not free of flaws, and is XML (which Grzegorz seems to particularly dislike)…

I dislike XML only insofar as it’s a markup, whereas fonts are data. Markup languages indeed are yet unbeaten technology if we consider, well, markup — description of documents of mixed content and loose structure. But using a markup-like language for describing plain data is an overkill both syntactically and processing-wise. I suppose XML is being used almost instinctively because of its familiarity and the parsers available in most programming languages out there, but this doesn’t mean it’s always a best choice for a given problem domain. Think the XML vs. JSON debate from a few years back.

Given the open architecture of the sfnt, I can imagine a development format that would involve all the basic, or native, languages for each type of technology being used in a font. That would be:

  • Type 2 Charstring for PostScript outlines
  • the feature file language for OpenType Layout
  • respective input files of AAT and Graphite technologies.

They are all great for hand editing, machine parsing, versioning and storage, and they all compile directly and losslessly into the font binary. These files and their compilers could give a basic, very reliable and flexible workflow. Different kind of programs, either filter-like or more abstract editors, could be built on top of that and used interchangeably. In fact, that’s exactly what UFO got right originally: a set of separate, domain-specific files. But it fell short with GLIF and Property Lists: a weak outline format, and a can of worms in ad hoc plist data, respectively. Swap all this for a native “language” for the core TrueType tables, including the glyf data and the hinting bytecode, and the image above is complete.

twardoch's picture

Grzegorz,

I hear what you're saying. However, I feel that you may be focusing too much on the color of the summer jacket. I mean, we're looking for a suitable piece of clothing for the right climate. Let's say SFNT is chilly summer weather. The .ttx format may have a somewhat out-of-fashion color or stylization, but at least it is a summer jacket. It's not a winter jacket nor a bathing suit. It is a good summer jacket, actually, it fits well and does the job. OK, it may be a bit out of fashion now — but is fashion a good criterion for making such choices?

.ttx was created in the time when XML was "the new thing" and was thought to be "a great method to express everything in human-readable text form". Sure, we do know now that it may not be a perfect fit for everything, and perhaps a different notation could be *marginally* better for a human-readable source form of SFNT.

But it really *is* good. When developing fontTools/TTX, Just van Rossum put *a lot* of thought into the best possible representation of various SFNT fields, and I agree with like 99.5% of his choices. I like .ttx much more than .ufo. I feel that .ufo overuses the .plist concept, which may be simpler to process but is not very readable. All those "key"/"array" alternations — they don't read well, and are just meaningless overhead. "Key key key key", "array array array array". That could be expressed much more elegantly in something simpler, like JSON.

But with .ttx, it's different. I do like the fact that the element names are meaningful, that in most cases data is expressed using elements rather than attributes, because it allows to visualize the ordering of various SFNT data structures very closely.

Let me put it this way: for years, I could not make sense of what SFNT looks like. I did not understand the format, I could not visualize it in my head. The moment I decompiled a .ttf into .ttx using fontTools/TTX, I had a revelation. I finally got it. Within an instance. I only had such a moment once before, when I understood how computers work after learning some simple 6502 assembler in Atari 130XE :)

.ttx has been used in many many projects by a large number of font developers. It's been tested with thousands of fonts (actually, tens of thousands). fontTools/TTX is practically bug-free and very, very stable. The data structure for each table is very well designed, it follows the SFNT structure quite natively, with just a few exceptions, where they actually make sense.

It even makes sense that it uses ISO 8859-1 rather than UTF-8 (really, it does).

It's certainly not a "self-compiling" format, i.e. it does not provide all the inherent structure within the data (so that the compiler could be completely "stupid" and the same routine could work for all tables). Perhaps you're looking for something like that? I don't know.

As I said, apart from some missing tables, for which support would be hugely welcome, and better documentation — both the fontTools/TTX and .ttx are actually close to perfection. Just van Rossum is a very modest man, but I think his development of fontTools/TTX and the .ttx format stand among the greatest achievements within font technology. Thanks to Python, it works everywhere, is easy and elegant to work with. In fact, the seemingly low maintenance of the code these days can really be attributed to its stability. Most "biggies" are done, all small things are also done, and it just works.

There are a few things that aren't done, they're all "biggies" — like AAT support. Since Apple is currently the largest computer company, the worldwide deployment of AAT "consuming" platforms is quite huge (especially due to iOS). When fontTools/TTX was started, the importance of AAT was marginal, and OpenType Layout was seemingly the only thing of interest. Today, it's much less so. So I'd be very grateful if anyone stepped up and contributed AAT support to fontTools/TTX.

But other than that — I personally think that developing "something completely different" from scratch to rival with .ttx is a waste of energy. Again, this is just my personal view.

.ttx does the job, is a very good low-level source format that compiles into SFNT natively, works well with versioning systems, is well designed, elegant, easily readable (among the easiest XML formats I know), solid, stable, and — in my personal opinion — rather beautiful.

If Just van Rossum is reading this — Just, I can say to you again what I've said several times: thank you, thank you, and once again, thank you. This piece of software you've written has allowed me to do things with fonts that I would have otherwise not been able to achieve. On countless occasions.

Roger,
A.

twardoch's picture

Ps. Contrary to what you may believe, the Adobe FEA syntax is not at all a very good representation of the OpenType Layout tables. There are many things that aren't supported by the FEA syntax. There are aspects of FEA where it makes generalizations and unifications which obscure the underlying complexities of the binary font data. And there are many many cases where the same binary situation can be expressed in a huge number of ways using FEA.

Adobe's makeotf compiler adds additional limits, since it does not "understand" many aspects of FEA. FontForge's implementation is marginally better is some aspects, but worse in others. But that's a different point of discussion.

I use FEA all the time, and I love it, most of the time. But it's not an exhaustive format. It makes it easy to develop OTL in 95% of cases, but really hard in the remaining 5%.

.ttx is actually much better in that regard. It follows the OTL structure closely, and you can express more with it than with FEA. At the same time, of course, .ttx is more awkward to work with, well, because OTL is a bit awkward. :)

FEA has of course its merits. It gives simplicity to what is actually inside a rather complex structure (OTL binary tables, that is). But by allowing, or applying the simplification, it sometimes obscures or even malforms some of the complexities.

Admittedly, FEA got a bit better over the years. Initially, it was almost just a "prototyping" language for OTL. These days, it's allows for more control, but still leaves a large undefined area, which a tool needs to fill. In other words, it's not a fully explicit format. Different tools that compile FEA to OTL will yield different results. In that sense, it's more akin to UFO.

FEA is good for how it's being used today — but, truing to understand your rationale, I'd say it's not a good format for your purposes. Certainly it's not round-trip-safe.

twardoch's picture

I just came up with a different way to express what I was trying to say about FEA: if the binary OTL is "data", then FEA is not a human-readable representation of that data. FEA is a clever and useful language that can be used to *generate* a very large subset of that data. On the other hand, the .ttx GSUB/GPOS table formats are a human-readable representation of that data.

John Hudson's picture

Adam, thanks for the comments re. Adobe .fea syntax. I was about to make the same comment in response to Gregorz's characterisation of it as the 'basic, or native, language' for OpenType Layout.

In regard to which, it is worth remembering the point that Karsten made earlier in this thread: VOLT sources are also a text representation of the GSUB, GPOS and GDEF data, and we've been able to utilise this to script fairly sophisticated FontLab-VOLT workflows.

Grzegorz Rolek's picture

I’m, of course, not denying the great engineering effort behind the Font Tools and TTX, and what Adam says about its specifics is indeed true. But the very choice of XML at its core is arguable not only in terms of a particular hype, or aesthetics, or even the difference in readability, but also in terms of some syntactic problems.

XML is a tree-like structure, whereas font binaries are not: they are flat arrays with offsets and such. For example, there’s a common optimization method within 'cmap' or 'name' table, where two different entires make an offset to the same exact data (or even part of it) deeper in the table. Try representing that in XML without bending around the syntax at the level of a misuse. It’s just not adequate for the job and we end up with details that are lost in translation. That these are details doesn’t make it an issue that can easily be ignored, because you recompile a font, run a diff, see something has changed, and wonder what else could go wrong.

Now if we move away from an XML representation, Python doesn’t seem like an obvious choice any longer, either. Therefore my opinion is that putting effort to improve TTX is not so promising as it seems. This of course doesn’t mean it’s not good or usable, because it certainly is. It’s just not good enough in hard-core font production.

Regarding the feature file syntax, I must admit that I haven’t dug through OpenType Layout tables thoroughly enough, so I might be wrong about it after all. I’ve thought that using the syntax in its most explicit form (that is, defining rules in lookups of one kind, referring them in features, setting language systems to each feature, etc.) will translate pretty losslessly to the structures inside the tables, won’t it?

Restrictive syntax that’s close to the way binary itself is structured, and accompanying stupid compiler are worthy in general. More stupid compiler means more optimizations have to be made at the syntax level, resulting in better source files for you and others to employ. It would also mean that you would have to learn and understand the technology rather well, and that’s a good thing in the end. Making the compiler stupid would also make it more deterministic and thus more reliable. (Still, an ideal compiler would be just smart enough to give you some hints how to optimize your code, but sufficiently stupid not to do it on its own discretion.)

twardoch's picture

Grzegorz,

OK, I now get where you're heading. You really want the "ultimate low-level" description, assembly-level. I think you're right that it would have value after all, and I agree that TTX is perhaps not the ideal approach for that.

Microsoft has produced some commandline tools: ttoasm.exe and ttodasm.exe, which work in a way that they're "stupid" compilers and decompilers. There is a set of .fmt format files (one per each SFNT tables supported by the tool, which is GSUB, GPOS, GDEF, JSTF and BASE). A format file for a given table defines the plain-text representation of that table, and at the same time provide information for the how the plain-text representation is to be transformed into binary, and vice versa. You can download it as a "Windows self-extracting EXE" archive from http://www.microsoft.com/typography/tools/tools.aspx

I recommend that you take a look at it. Inside, you'll find TTOAsm.zip and TTODasm.zip, which include the tools, some documentation, and the .fmt files for all five OTL tables. You can run the tools through Wine (cxstart) on any OS.

For example, the GSUB format file looks like this (it's an excerpt):

DEFINE ZERO=0
DEFINE MAXCOUNT=0xFFFF
; **** GSUB Table ****
GSUBHeader, HEAD {
fixed32, 0x00010000 ; GSUBHeader version
Offset, ScriptList, NOTNULL
Offset, FeatureList, NOTNULL
Offset, LookupList, NOTNULL
}
; **** Scripts ****
ScriptList, TABLE {
Count, 1, MAXCOUNT ; ScriptCount
Array, $T1, ScriptRecord
}
ScriptRecord, RECORD {
Tag ; Tag
Offset, Script, NOTNULL ; An offset to a Script table to be generated
}
[...]
LookupList, TABLE {
Count, 1, MAXLOOKUPCOUNT ; LookupCount
Array, $T1, Offset, Lookup, NOTNULL : $IGSUBLookupIndex
}
Lookup, TABLE {
uint16, 1, 8 ; LookupType
uint16 ; LookupFlag
Count, 1, MAXCOUNT ; SubTableCount
Array, $T3, Offset, SubstTable, NOTNULL, $T1
}
CoverageFormat1, TABLE {
uint16, 1 ; Format 1
Count, 1, MAXCOUNT ; GlyphCount
Array, $T2, GlyphID, 0, MAXGLYPHID
}
CoverageFormat2, TABLE {
uint16, 2 ; Format 2
Count, 1, MAXCOUNT ; CoverageRangeCount
Array, $T2, RangeRecord
}
[...]

So it essentially is a "schema" for the GSUB table in a plain-text form. And then, an actual example GSUB table definition would be (also an excerpt):

GSUBHeader theHeader
0x00010000 ;Version
theScriptList ;Offset to ScriptList table
theFeatureList ;Offset to FeatureList table
theLookupList ;Offset to LookupList table
;****************************************
DEFINE MAXLOOKUPCOUNT = 10
DEFINE MAXFEATURECOUNT = 5
ScriptList theScriptList
1 ;ScriptCount
'arab' ;ScriptTag = "arab"
Script1 ;ffset to Script table
Script Script1
DefaultLangSys ;Offset to default LangSys
0 ;Number of LangSysRecords
LangSys DefaultLangSys
NULL ;Lookup order = null
0xFFFF ;No req'd Feature Index
5 ;Number of Feature Indices
0 ;Feature Indices
1
2
3
4
;****************************************
FeatureList theFeatureList
5 ;FeatureCount
'init' ;FeatureTag = 'init'
FeatureInit ;Offset to Feature table
'medi' ;FeatureTag = 'medi'
FeatureMedi ;Offset to Feature table
'fina' ;FeatureTag = 'fina'
FeatureFina ;Offset to Feature table
'liga' ;FeatureTag = 'liga'
FeatureLiga ;FeatureTag = 'mset'
FeatureMset ;Offset to Feature table
[...]
;***** Arabic Ligature Sub Table *****
LigatureSubstFormat1 SubstTableLigaBase
1 ;Format
CoverageLigaBase ;Offset to Coverage
5 ;Ligature Set Count
LigatureSet1
LigatureSet2
LigatureSet3
LigatureSet4
LigatureSet5
;***** 0165 ;'shaddamedial'
LigatureSet LigatureSet1
0x0003
LigatureSet1Ligature1
LigatureSet1Ligature2
LigatureSet1Ligature3
Ligature LigatureSet1Ligature1
0x0154 ;'shaddawithfathamedial'
2
0x015F ;'fathamedial'
Ligature LigatureSet1Ligature2
0x0155 ;'shaddawithdammamedial'
2
0x0161 ;'dammamedial'
[...]

and so on.

This is very low-level, in fact the assembly level. I don't know how many .fmt schemas Microsoft created and for how many tables, but this "language" that Microsoft created could be used and extended in an opensource tool.

An alternative might be an approach similar to Pomax's, who authored a generic "binary parser generator", and then defined a language for binary parsers, where, as one example, he created a parser spec for the OpenType font format:
https://github.com/Pomax/A-binary-parser-generator
https://github.com/Pomax/A-binary-parser-generator/blob/master/OpenType....

This approach could be refined and turned into a "converter", with text-to-binary and binary-to-text conversion. In a way, Pomax's OpenType.spec format description is rather similar to Microsoft's .fmt description for tools.

Of course the "big question" when inventing such a format is the typical one: what delimiters should be used, what to do with whitespace, etc. etc. I guess one reason why people choose XML these days is because there, these questions are simply ANSWERED (and very well specified). So even though XML may not be the best format for a particular usage scenario, it's a well-defined one and they don't need to start a debate about what the "ideal" format should be. :) With your project, a debate will surely start if you propose some kind of notation (e.g. similarly "looking" to Microsoft's TTOAsm, or to Pomax's parser format, or to the Adobe FEA structure or to the Adobe AFM structure or to the plain-text Type 1 form, or to Ikarus or to whatever else, or you come up with your own). Some people will "like it", others will "hate it", others will "like it but propose improvements", others will "like it as it was before the improvements but not now". :)

So, as it's often the case, the probably biggest value of XML is that it exists at all :D There is a big community of tools, schemas, workflows, "best practices" etc. around it, which people can cherry-pick rather than invent their own. The same can be said, I guess, about things like PHP. PHP is not a really good language, but it's easy to deploy because you don't have to invent a lot of things :)

Best,
Adam

k.l.'s picture

It really depends what you are planning to do. If you intend to catalog fonts in a readable form, or want to adjust a single value while leaving everything else intact, then a close-to-binary non-binary representation might be a good thing. But when making fonts, this is hardly relevant and may even turn out to be an obstacle, because a designer does not want to change a individual piece of information and rather do some design adjustment which causes a couple of pieces of information to change.

For example, is it really necessary that a non-binary representation of glyf precisely reflects how x, y and point type are packaged? I don't think so because, while using flags to save some space, the information relevant for rendering shapes remains the same. Similarly, whether you use PPF1 or PPF2 subtables to store metrics adjustments in GPOS is irrelevant as to the information they hold, though one my save some space compared to the other.

Grzegorz Rolek's picture

Thank you, Adam. That’s a lot of cool software I wasn’t aware of (Pomax simply never stops to amaze me). This indeed is low as it can be, lower and more verbose than anything I thought of, but it’s good to have it as an option, plus already mentioned VOLT sources, which I must look into.

Karsten, yes, such tools would be of questionable utility to many designers indeed. Still, they are intended primarily for post-production and engineering side of things, where all this could matter. It’s not only about space savings (though that’s important these days), but also about performance, and about doing tricks that push the technology to its extremes, either for testing purposes or for producing non-standard or special-purpose fonts. And, as Adam mentioned briefly, it’s often the best way to learn to think in terms of font internals, what I believe makes you build better fonts in general.

If I recall correctly, Dave Crossland once said that he would like to see a standard build environment for font development, a kind of ./configure; make; make install for fonts. That would be another perspective why it would be really good to have some concise and effective, not too low-level, but still deterministic languages for font production.

Syndicate content Syndicate content