Using A Particular Unicode Number To Mark A Font, Which One?

Richard Fink's picture

I am looking for a way to label fonts using a particular Unicode position. (This is for testing web fonts - testing exactly which font file in the @font-face stack the browser is displaying. In other words, is it the EOT, the WOFF, the SVG, the Data URI - which one is the browser using?)

The question is: Which Unicode position to use for this purpose? (Latin-centric.)

The main boundary condition is that there be a near zero chance of conflict with a glyph that already exists in the font. And if there is a conflict, that there be no unwanted side effects stemming from simply replacing it with the marking glyph.

Right now, the best candidate seems to be the Apple logo from the MacRoman codepage with a position in the Private Use area, Unicode 63743 (U+F8FF).

Anybody see a problem with this?

Anybody have a better idea?

Thanks,

Rich

Thomas Phinney's picture

Why not use a set of very distinctive fonts, a different one for each format, and write down which you used for which? Or is there some other constraint you haven't mentioned?

Or.... why not just use a regular glyph position, and replace the content? Take a common character (say, numeral 1) and replace it with a glyph that displays the name of the format. So the EOT font has a glyph that says "EOT" and so on. Start with a font that has a license that permits this, probably an open source thingie.

Cheers,

T

cuttlefish's picture

My understanding is that Apple, inc. objects to font developers putting an Apple, inc. logo in the U+F8FF position unless the font in question is licensed for distribution by Apple, inc., but I cannot provide a citation for that.

Since that position is in the Private Use Area, according to Unicode rules you can put whatever you want in there. That's what the PUA is for. But, since it is part of the MacRoman encoding, there is somewhat of a compulsion to fill the slot. Some font developers use that spot for a foundry logo or signature glyph of various sorts, or an apple dingbat that doesn't resemble the Apple, inc. logo. Also, U+F8FF is claimed by the Conscript Unicode Registry for the end of the Klingon script range.

You may wish to choose a different PUA position for your test glyph if these potential conflicts will interfere with your experiment.

oldnick's picture

I use the florin position to "trademark" my fonts...

twardoch's picture

Apple's restrictions may apply to the use of the Apple logo, i.e. both the actual drawing of the apple, and, possibly, the glyph name "apple". They most certainly don't apply to a use of any particular codepoint.

Indeed, quite a few font vendors have been using U+F8FF to put a vendor sign there. So this seems to be good practice. Rather than using the glyph name "apple", I recommend to use the glyph name "uniF8FF", which will keep your hands 100% clean.

Richard Fink's picture

@tp

I'm looking for an across-the-board standard way to query which file has been parsed and is on display in the browser. With any font at any time. (Or at least the ones that have this marking glyph included.) Not just for a test suite, but standard practice.
Just a way to visually double-check that what you think is happening, is indeed happening.

I simply can't, as yet, come up with any other way of querying for this. There doesn't seem to be any.

oldnick's mention of the Florin is interesting but right now I'm in Aruba and although they mostly use dollars I've also got Florins in my pocket so I'm a little loathe to scrap it if the font's got it! And I don't want the folks at DTL to get cross with me. Hah!

Sounds like the Apple position is the best bet in that it's also used for foundry logos and is therefore already being used as a marker.

And Adam is quite right: Apple certainly doesn't have any dibs on the unicode position, just the logo. Leave Apple's name out of it, and there's no confusion.

Now, interfering with native Klingon speakers is another matter... those folks can get nasty.

Rich

twardoch's picture

Oh, I see. Essentially, what you're asking for is a glyph that, in a way, should be unique in every font file (even format). Interesting concept. I've never thought about this in that way.

I doubt the industry will ever agree on a "standard practice" here. This would involve a rather different workflow: you'd need separate fonts that you feed into the WOFF and EOT conversion workflow. I doubt many people will do that.

That having said that, the entire Private Use Area is at your disposal. U+F8FF could be one candidate, a neighbor codepoint (e.g. U+F8FE) or perhaps the very first PUA codepoint (U+E000) could be other candidates. But the point obviously is that PUA is, by definition, private. So not much chance to get it adopted as an industry-wide standard practice.

Also, I doubt that you'd really need to include such character is every single webfont that is shipping, right? It's enough to produce a set of test fonts which all have a different glyph in the same codepoint. Actually, this is something I might have a chance to get around doing.

Best,
Adam

Richard Fink's picture

@adam

I'm not thinking along the lines of industry-wide. Just me. But it would be a technique that anyone could use as a diagnostic tool if they so choose. And it will work best if it uses a Uni point that's not going to freak anybody out because they're already used to seeing something there that's not a part of the glyph set. Hence, my question to the community.
The "Apple" seems to fit the bill as well as it can be fit.
If it was up to me alone, I'd ditch the upside down exclamation point or something like that - for all the times I'm ever going to use it - but the codepages are what they are and everybody's used to seeing them that way.
Just trying to add a diagnostic option while staying out of tradition and habit's way.
Re-tasking what's already often used for "marking" whether it be the Apple or a Foundry Logo seems the best bet.

rich

twardoch's picture

But the upside-down exclamation mark is a normal punctuation sign in Spanish! It's used at the beginning of an exclamation sentence, while the "normal" exclamation mark is used at the end. The reversed question mark and the regular question mark are used the same way: ¡Hola! ¿Qué?

That's the thing with valid Unicode characters: using them for something other than they are intended may cause you more trouble than it's worth. The Private Use Area is best for all sorts of non-standard use.

Best,
Adam

Richard Fink's picture

Ahh, user input, user input. Up is down, down is up.

To paraphrase former Supreme Court Justice Sandra O'Connor and former US Senate Majority leader Trent Lott in a kind of mashup, "If everybody would just speak English, we wouldn't have these kinds of problems."

Ciao, baby.

Richard Fink's picture

Related to my question on this thread is:

In comparing various codepages I've noticed that, because of conflicting assignments between common codepages such as MacRoman and ANSI, certain positions seem to be, in effect, dead.
For example, #133 -
In MacRoman it's O diaresis
In ANSI it's a horizontal ellipsis

Ok, so which is it? Or, in practice, neither?

There are quite a few conflicts in the range 128 through 159. I notice the Windows Glyph List (WGL4) just bypasses them - they're not included. (And the equivalents for some of them exist elsewhere in Unicode.)

I'd be interested in knowing how different font makers handle this. What do you put in these spots? Are they wildcards? Discretional? Left empty?

Grateful for any input.

Rich

cuttlefish's picture

Unless I am mistaken, for the most part, are not the various encoding tables relevant only to legacy systems nowadays, being largely supplanted by Unicode on modern computers? The MacRoman encoding I take as a guideline for the subset of Unicode glyph slots to fill in a minimally complete font, in union with Latin-1, Latin-0, and Windows ANSI to form a basic Western Latin plus extras that most computers expect, then encode the finished font as Unicode. There are both an O diaresis and ellipsis in the result, and I don't let the position in the table bother me as it doesn't seem to bother the computers.

Of course for your specialized application all this that doesn't matter to me may mean a great deal to you. My apologies if I'm misunderstanding.

Arno Enslin's picture

@ Richard

I did not read the whole thread, so someone may already have said it: You only want to know, which format is used – eot, woff and so on? And you want to store that information as a character in the font, in which the page normally is displayed? Why don’t you simply provide fonts on the first place of each font stack, that contain only that key character? Then you had not to replace the character in each of the fonts, that you want to test, but you only had to create a few test fonts.

Example for woff and eot: The page shall be displayed in Charter and you have embedded Charter.woff and Charter.eot. Additionally you embed fonts, that are called Woff.woff and Eot.eot. The font Woff.woff contains the letter W and the font Eot.eot the letter E on the position of the apple for example. And your stack would be "font-family: Woff, Eot, Charter". In the stack they should be in the same order, in which they are embedded. (Or in the other way around, if it also cascades.)

Clear, what I mean? You don’t mark all web fonts, that you want to test, but only a few test fonts (very small files), that you additionally provide in the stack.

Richard Fink's picture

@cuttlefish:

I think you answered my question. It was obvious, too. Go with Unicode, forget the rest. It's history.
That makes sense.
(The position in the table isn't a concern for me, either.)

@arno
Yeah, for pure diagnostics, a one character font would work fine. Or, a special diagnostic font just for the purpose. Rather than get into the complications of placing a special character in a font as a matter of policy.
That makes sense. That's what I'm gonna do. Will use the Apple char position.

Rich

Thomas Phinney's picture

Another way of explaining it:

You don't refer to characters in text by that kind of decimal codepage-based encoding without also specifying what encoding you're using. Generally, you don't use such encodings at all, but Unicode instead. OS-level machinery will translate Unicode into the appropriate OS-specific codepages for older apps that are using codepages, so the font creator doesn't have to worry about it. The main point of ever looking at those old codepages, for somebody building a font today, is for ensuring that you are covering all the encoded characters in a given codepage.

Cheers,

T

Richard Fink's picture

Thanks for the explanation, Thomas. It was clear that there was a codepage to Unicode mapping layer somewhere and that codepages, at least today, are little more than language-specific character lists. (Correct me if wrong.)

Adding to the confusion in the world of web development is the charset property like:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

which probably only one web dev in 10,000 understands other than, "Well, it's something you *have* to include on the page." (But that, on desktop browsers, at least, you can easily override with other selections. Adding to the confusion.)

In English language pages, I've seen the recommendations morph over the years from Windows1252 to ISO 8859-1 (still used on the W3C's online docs), to landing, ultimately, with utf-8 which today looks like the final stop on the codepage train.

Thanks.

Rich

Thomas Phinney's picture

It was clear that there was a codepage to Unicode mapping layer somewhere and that codepages, at least today, are little more than language-specific character lists.

Yup, that's pretty much it.

utf-8 is a means of representing Unicode, so it's a good choice (at least, for most western languages; for east Asian languages utf-16 may be a more compact representation).

Cheers,

T

John Hudson's picture

...codepages, at least today, are little more than language-specific character lists. (Correct me if wrong.)

Yes, but note that it is strongly recommended to include at least one complete Windows 8-bit codepage in every font, and to accurately identify this codepage in the OS/2 table codepage range. MS Office apps still make use of text libraries that perform font switching based on codepage support. If a font does not support at least one codepage, it might not work at all in some MS apps.

[This is true even if the intended script support for the font is something that never had a Windows 8-bit codepage defined, e.g. Devanagari. So when we make a font for such a script we include support for the Win CP 1252 Latin.]

Richard Fink's picture

Thanks, John. Noted.

Syndicate content Syndicate content