Are these glyphs *really* used in any languages?

dan_reynolds's picture

Hello,

I am considering removing the following glyphs from the character sets of fonts that I produce in the future (or not designing them at all for fonts I design in the future):

1. Aringacute (U+01FA)
2. AEacute (U+01FC)
3. Oslashacute (U+01FE)
4. aringacute (U+01FB)
5. aeacute (U+01FD)
6. oslashacute (U+01FF)
7. dotlessj (U+0237)

Aside from the dotlessj, I had heard that the other glyphs were for Danish, but that they are not used in contemporary, written Danish. Can anyone point to references where they are used?

Are any of these glyphs used in any other languages? If not, I do not feel that it is necessary to include them in standard character sets any further, as least as far as fonts that I produce go. I am not sure if their being in older character sets or in other fonts' character sets (from e.g., Microsoft? Adobe?) is a convincing enough argument. What do you think?

agisaak's picture

AEacute/aeacute were used in Old English, though a macron is more common in modern renderings (and vowel length wasn't obligatorily indicated anyways). dotlessj exists, as far as I know, to provide support for floating diacritics.

clauses's picture

Hi Dan
I tried to get to the bottom of this one some years ago. The thing is that you have to call The Danish Language Council to get the info. The woman that was present at a conference in Iceland many years ago, where these lists were made, works there. The list I'm talking about is the one from Evertype http://www.evertype.com/alphabets/danish.pdf. When I called she was on vacation, and then I moved out of the country. Try calling them and ask on +45 35 32 89 90. Office hours are mon-thur 10-12, and Friday 9.30-12.30. Let me know what they say. I have a feeling that those characters are redundant in Danish. I can't find any references of use in the Danish corpus http://ordnet.dk/ods/.

Goran Soderstrom's picture

I have never believed in the Aringacute and I have actually talked to a couple of danish people, they all say the same – it is NOT needed in the danish language. They have never seen it.

blank's picture

I’m getting ready to kick 1 – 6 out of the fonts I am working on right now and I don’t have a clue about dotlessj. But I do think that some of these oddball characters raise an important issue that keeps coming up in conversations: font bloat. Why are font designers creating so many fonts with huge character sets to support languages we know nothing about? Are native speakers in northern, central, and eastern Europe using fonts from the US and Western Europe, or do they laugh at our wacky diacriticals and odd spacing?

charles ellertson's picture

I've had to set a j with an accent several times. From a comp's point of view, we can make up anything needed (in setting the job) as long as we have the pieces -- so retaining the dotless j would be a plus. Otherwise, it's FontLab & remake the font.

Sindre's picture

AEacute/aeacute and Oslashacute/oslashacute is used for modern scholarly transcription of Old Norse and Old Icelandic, marking distinctive opposition by quantity. The latter pair replaces the earlier OEacute/oeacute usage. Old Norse is still read and written by scholars and enthusiasts (myself included), and speakers of Icelandic and Faeroese (to some extent) can read it untranslated.

Edit: This shows a dictionary lookup for an Old Norse ("norrønt") dictionary hosted by the University of Oslo, with the "weird" characters available by input buttons.

clauses's picture

Aha, I had a hunch this was the case. A bit of extra research points to the characters being used for transliteration from other Nordic languages into Danish. It's described in Inter-Nordic group on Information Technology Standardization (INSTA/IT): Nordic Cultural Requirements on Information Technology. INSTA technical report STRÍ TS3. Reykjavik: Icelandic Council for Standardization (STRÍ) 1992, ISBN 9979-9004-3-1, pp. 33-37

paragraph's picture

Sorry for the distraction: Charles, are the base glyphs (such as dotlessi and dotlessj) and diacritics which need to be kerned/moved over them useful to typesetters/compositors at all? What would you like to see included in fonts?

blank's picture

So should these characters be supported in anything other than faces intended for text use? Does anyone ever buy display faces to use for Old Norse or transliterating Nordic languages into Danish? Or are people who use these languages doing small projects using system fonts?

.00's picture

Aw common, don't be so lazy!! The glyphs mentioned are about 2 minutes worth of work once you have your basic characters done.

Sindre's picture

Aw common, don’t be so lazy!

My thoughts exactly.

dan_reynolds's picture

I try not to be lazy, James! But my colleagues and I are producing hundreds of fonts a year. So cutting out a few superfluous glyphs [if they are in fact not really necessary for our customers…] from our character set would save some total testing time.

Plus, we have to have some wacky ascender and line gap settings sometimes because of the Aringacute and its effect on a font's bounding box sometimes. So, I would not cry to see it go.

John Hudson's picture

You can leave out any character until a customer asks for it. :)

Sindre's picture

I think it's pretty safe to let Aringacute go. I think it can be regarded as a hypothetical glyph, unlike Oslashacute and AEacute, which are both standard letters in Old Norse. All Danish vowels can in theory be emphasised by an acute, but I have never, ever seen Å with an acute. In Norwegian and Swedish, only a, e and o can take acutes.

John Hudson's picture

are the base glyphs (such as dotlessi and dotlessj) and diacritics which need to be kerned/moved over them useful to typesetters/compositors at all?

Note: positioning of combining diacritics over bases is a GPOS anchor attachment lookup, not kerning. Kerning affects the position of all subsequent glyphs, while anchor attachment is mark-specific.

The value of supporting dotless variants of soft-dotted letters and combining marks depends on the target languages. There are lots of languages in the world that rely on such mechanisms, because Unicode only includes precomposed diacritic letters for backwards compatibility. In effect, this means that only major languages that were supported in pre-Unicode charsets get precomposed diacritics: everyone else relies on combining marks.

My clients request such things, but then they tend to be either involved in internationalisation or in scholarly publishing.

One thing I do strongly recommend is to include Unicode combining mark character support for any accents used in precomposed glyphs in your font, not just the subset spacing accents (which are also backwards compatibility characters) and use the combining mark glyphs as components in composites. This has two benefits: it means that all your diacritic components can have shared x,y coordinates, rather than having different offsets with some components based on spacing diacritics and some on non-spacing, and it means that if you do decide to add GPOS anchor attachments you can use the component offsets to give you a head start.

Sindre's picture

By the way, oogonek and oogonekacute (not oogonekmacron, as Unicode erroneously states) are also necessary to write Old Norse, so I guess it doesn't make sense to include oslashacute and aeslashacute unless your font support these glyphs.

k.l.'s picture

John -- There are lots of languages in the world that rely on such mechanisms, because Unicode only includes precomposed diacritic letters for backwards compatibility. In effect, this means that only major languages that were supported in pre-Unicode charsets get precomposed diacritics: everyone else relies on combining marks.

Which brings me back to my wish to avoid all precomposed diacritic letters (those that allow design-wise), play out the character/glyph distinction (rather than serving exactly one glyph per character and naming it so), and employ ccmp/mark/mkmk radically (rather than consider them as add-on). If only reality -- operating systems and applications -- were there yet ...

Good to know about a/Aringacute.

satya's picture

In addition to Dan's list, I wanted to know if anyone ever seen these signs in use too?

8. ¤ - Currency Sign (U+00A4)
9. ‰ - Permille / Per thousand (U+2030)

Plus, a silly question: How do we access the Multiply Sign (×) with a normal keyboard? :P

blank's picture

I have seen the currency sign used in financial newspapers and The Economist a few times, but those publications are working with custom fonts. I can’t find any evidence that Permille is ever used. There was a conversations about these two on Facebook where Jonathan Hoefler pointed out that H&FJ had examined thousands of annual reports and never seen these characters used which is why they dropped them from their character set.

DTY's picture

The per mille sign shows up a lot in scientific typesetting, for instance when talking about isotope fractionation.

charles ellertson's picture

Sorry for the distraction: Charles, are the base glyphs (such as dotlessi and dotlessj) and diacritics which need to be kerned/moved over them useful to typesetters/compositors at all? What would you like to see included in fonts?

If you're a "normal" typesetter, yes, you need the base glyphs. All a typesetter can do is to laboriously make up the needed characters -- much as had to be done with a macron accent in the type 1 days.

At our shop, unless the occurrence of such characters is only once or twice in the job, we'll decline it if the designer insists on using a font where the EULA forbids modification. For fonts we can modify, our rule of thumb is the comps (both of them) are expected to make up such accented characters if they occur once or twice, otherwise, I'll make up the needed characters in the font, and write a ccmp feature if needed. It's all about what is fastest without leaving any little encoding or character lies in the final files.

Sindre's picture

Blood alcohol content is always measured in per mille in Norway, and the per mille sign is quite often used in tables, statistics and news graphics about drunk driving. I think the same goes for Sweden and Denmark, but I'm not quite sure about that.

Sindre's picture

Concerning the general currency symbol: On French, Danish, Norwegian and Swedish/Finnish keyboards, ¤ is accessible from any keyboard as shift-4, so it would be weird if that keystroke produced no character. Its supposed use is as a placeholder symbol when ₤ or ¥ or ₡ or even weirder currency symbols are unavailable, as a sort of fallback symbol, right?

John Hudson's picture

On the subject of units used in measuring small quantities, one good reason why the micro character µ might sometimes benefit from being a distinct glyph from the Greek mu μ is that it is combined with Latin g to indicate micrograms µg. Less, commonly, it occurs with uppercase G in physics as an abbreviation for microgravity µG.

clauses's picture

I think the same goes for Sweden and Denmark, but I’m not quite sure about that.
Yes, you most definitely need the 'promille' sign (as it is called in Danmark). It's used in statistics, physics, chemistry, economics, medicine and so on. That one can't be deleted.

blank's picture

Its supposed use is as a placeholder symbol when ₤ or ¥ or ₡ or even weirder currency symbols are unavailable, as a sort of fallback symbol, right?

Yes, but how many people actually typeset that sort of thing? As I noted above, the only times I’ve seen it used it was in newspapers that have custom fonts at their disposal.

John Hudson's picture

The currency symbol (U+00A4) is a standard character in the Latin 1 character set and others; for that reason alone it should not be omitted from any font that claims to support that character set. Since the ANSI Latin 1 -- or a related codepage such as Windows CP 1252 -- is the first building block of pretty much any Latin script font, this character should be included in most fonts.

If you want to be discretionary in your support of characters that are not part of 8-bit codepages, and disagree with e.g. some inclusions in the WGL4 list (which deliberately errs on the side of inclusion), that's fine. But I recommend against cutting characters that are part of 8-bit codepages, if your font is going to claim support of those codepages in the OS/2 table.

paragraph's picture

The currency symbol (U+00A4):

I just love what Bringhurst (EOTS, p. 313) has to say about it: Since the symbol is parasitic (it takes up space on the font but offers nothing in return), louse might be a better name. Having no true function, it has no authentic form.

dan_reynolds's picture

Hmm... I'm going to have to be convinced that it is necessary to support Old Norse in 21st century fonts. Where is this language used, aside from scholarly publications? Are books still printed in it? How many websites out there are writen in the language? Are contemporary typefaces even appropriate for something like this?

paragraph's picture

I agree, even if only for bad reasons: I am lazy and always in a hurry. Wouldn't it be great if there was some reasonably authoritative table telling the lazier among us which glyphs can be skipped, say for faces that are not meant to be used for scholarly materials and such. For example, I have stopped putting the lozenge, product, integral and such into fonts that I myself would not want to use for a maths book, in spite of them being in MacOS Roman. Could someone more experienced and trustworthy provide such list? Is it worth starting a thread for this, and if so would someone please do it?

mili's picture

Satyagraha, in a lot of fonts with a Finnish keyboard shift-4 brings out €, that weird currency ;^)

Sindre's picture

Old Norse never really died out, as long as the written language is still understood by speakers of Icelandic and to a great extent Faeroese (even Norwegians and I think Swedes and Danes get the general idea when reading it, but the complex grammar is largely incomprehensible without some education). Written Icelandic is virtually indistinguishable from Old Norse, though the pronunciation is quite different and the vocabulary is vastly increased, almost exclusively building on Norse roots, not by importing loanwords. There is a huge body of literature written in Old Norse, the Kings' sagas (including the great works of Snorri Sturluson) and the Icelanders' sagas probably the most read today. These are still printed in their original language or in bilingual versions. I own several of them.
Old Norse is still taught in the Norwegian equivalent of high school (at least, it was when I attended).
There are several web pages containing Old Norse texts, most of them suffer for lack of proper glyph support.

Just
some
very quick
links

A Google search of "hljoðs bið ek allar", the opening stanza of Voluspá in normalised Old Norse ortography, returns 927 hits, while a Google search for "Mieleni minun tekevi aivoni ajattelevi", the opening stanza of the Finnish national epos Kalevala, written in modern Finnish, returns 1200 hits.

Sindre's picture

@mili: Yes, the placement of the glyph only refers to the Norwegian keyboard. My bad. But can you write the ¤ with your Finnish keyboard?

mili's picture

Satyagraha, I can't find it straight on my mac, only through character or glyph palettes.

dan_reynolds's picture

OK, the dotlessj is going to stay in, but I think that the other six glyphs are coming out. Thanks for the input everybody!

Sindre's picture

Mili, a little research shows that Finland now has a new keyboard standard with € on Shift+4, whereas Finland and Sweden before the new standard used identical keyboards with ¤ on Shift+4.

mili's picture

Thanks Satyagraha, there's always something new to learn!

Nick Shinn's picture

But do Old Norse scholars, and Old Norse publishers, actually buy commercial fonts?

Scholars and scholastic publishers, it seems to me, want every obscure character/glyph ever used, and if a character is not in a system font will make hacks or acquire freeware fonts made by other scholars, rather than buy fonts.

I beg to differ with James: it can consume significant time to add a character to a typeface: upper case, lower case, small caps, all the weights, roman and italic--it adds up.

Sindre's picture

There's a lot of truth in that, Nick. Twenty years of no choice has made and hardened that habit. I think the situation would have been quite different if scholars had ever had a choice. Mediævalists today still don't, the only typefaces I know of to fully support Old Norse are Gentium and Junicode, both made by scholars, I think, and both freeware.

How do we chose what parts of Unicode not to support in our typefaces? Are all those weird and ugly African glyphs in Latin Additional 2 really needed? What about the archaic Greek and Cyrillic characters? Why do we still support Polytonic Greek, when that orthography is deprecated?

blank's picture

How do we chose what parts of Unicode not to support in our typefaces?

By focusing on users who actually buy our fonts and supporting languages we actually have a clue about. I really don’t know why I keep worrying about Central and Eastern European character support, because I’m not selling fonts there and I don’t have the inclination to build a library of publications from those regions to compare my work to. As I said above, for all I know I’m just wasting time building characters for people who are just laughing at the ignorant efforts of some guy whose knowledge of their alphabet was picked up entirely from four web sites.

Si_Daniels's picture

I would be inclined to map a $ sign to all of those code-points. Then if customers want them its pretty clear as to what they need to do. You could go a step further and draw a [$50] glyph, although the fee might change subject to inflation.

.00's picture

I really don’t know why I keep worrying about Central and Eastern European character support, because I’m not selling fonts there

If you build it they will come!

dan_reynolds's picture

Si, in our case, that would be a Euro sign… but I quite like the idea :-)

James (Puckett), I hear your pain! But CE and Eastern European language support is essential. Really, Cyrillic is too. These are languages and countries that cannot be ignored. The comparison with Old Norse is not apt. Of course, official Norwegian, Icelandic, Danish, Swedish, etc., are 100% essential to support. As are the official languages of all other European countries. But archaic, unofficial languages are another matter. Of course, we'd be happy to build in special support to any of our fonts if a customer would commission that. But I don't think that Old Norse (sorry, I don't want to pick on this language too much!) publishers represent a significant portion of our customer base.

It isn't just licensing to customers in Poland, Hungary, Romania, Latvia, Bulgaria, Russia, etc. It is licensing to companies down the street in Frankfurt who sell their products to consumers in those countries or do business there otherwise. I know that it might be a little different in New York, but we are long past the discussion-phase as far as these languages go. One can still decide not to support them at their peril, though.

paul d hunt's picture

I’m just wasting time building characters for people who are just laughing at the ignorant efforts of some guy...

welcome to the world of typeface design

Dan, have you had a look at the Adobe Latin Extended character sets?
http://blogs.adobe.com/typblography/2008/08/extended_latin.html

In these the first six of your glyphs are relegated to the AL5 glyph set. The dotless j belongs to AL4 set, which is used for families with extensive language support on the scale of Arno & Hypatia.

Nick Shinn's picture

...the only typefaces I know of to fully support Old Norse are Gentium and Junicode...

What should be added to those characters listed in the wiki entry?

http://en.wikipedia.org/wiki/Old_Norse_alphabet

Sindre's picture

Oogonekacute/oogonekacute and AEacute/aeacute.

Nick Shinn's picture

AEacute is in Latin Extended A, but not Oogonekacute.

So, Oogonekacute would be more useful than Aringacute....

Nick Shinn's picture

re. Adobe character sets.

Looks like there is no standard for fonts, just each foundry coming up with its own encodings, and the users having to cope with that.

Perhaps Unicode could have done a better job of making its code pages practical.

Sindre's picture

Definitely. Aringacute is hypothetical, oogonekacute is not. In fact, Oogonekacute/oogoenacute is not in Unicode at all, they are erroneously rendered as Oogonekmacron/oogonekmacron (u+01EC/u+U1ED). Those characters have no use in any language I've ever heard of. Therefore, Private Use Area u+E20C/u+E60C have become unofficial standard encodings for Oogonekacute/oogonekacute amongst mediævalist-hackers.

blank's picture

These are languages and countries that cannot be ignored.

Why not? Why does it make more sense for me to try and support languages with no clue whether or not I’m doing it right than to leave that level of language support to companies like Linotype that can employ designers trained in such matters? Why aren’t I better off to spend my time doing a better job on languages I know about instead of burning up hours on stuff about which I’m clueless?

Syndicate content Syndicate content