Unicode programmer font

Tomek Peszor's picture

Unicode is now standard, I suppose.

So every coder (typesetter too) needs a fixed width font
with Unicode support.

Can you share your favorites?

Tim Ahrens's picture

Unicode is now standard, I suppose.

Standard – what for?

Coding languages are far behind proper typesetting, most of them accept nothing but good old ASCII.
There are tendencies, however, to support Unicode in the source code.

Tomek Peszor's picture

I should have wrote “standard” in quotes.
For example for newsgroups, e-mailing, web-pages, TeX, TTX and other soure codes.
Simply — everything that can be edited (ie. commented)
in text editor with Unicode support.

I used Andale Mono for quite a long time,
but now I need something smarter,
for example with hebrew and japanese character support.

--
regards,
Tom

Si_Daniels's picture

Tom,

Can you provide a programming scenario where having all of those writing systems in a single font would be useful?

In the case of localizing a program, strings would normally be kept in a resource file, and not hard-coded.

Cheers, Si

Tomek Peszor's picture

Yes. For example:
a) writing article for Wikipedia in my my favorite text editor.
b) preparing multilingual web page

This screenshot illustrates how AndaleMono and DejaVu look like in jEdit:
http://img299.imageshack.us/my.php?image=compareandaledejavulv8.png

I’m just curious what font do you use :)

--
regards,
Tom

Si_Daniels's picture

I see, I was taking a more purist view of 'coding'. I would pick a font that you like best for your primary language and rely of font-fallback in your OS or text editor for the secondary languages.

Hopefully we'll see more apps and OS's using 'composite fonts', where the user can specify which font (along with scaling and baseline adjustment) to use for each Unicode range - rather than pick a one-size-fits-all solution.

twardoch's picture

Simon,

for coding purposes (a monospaced linear font), a pan-Unicode font would actually be useful.

Monotype Imaging offers Andale Mono WT, a pan-Unicode monospaced font with the same design as the free Andale Mono font. There are four versions that cover the entire Unicode 3.0 range: Andale Mono WT J (Japanese), Andale Mono WT K (Korean), Andale Mono WT S (Simplified Chinese), Andale Mono WT T (Traditional Chinese). There is also a set of two fonts: Andale Mono WTG and Andale Mono WTG Surrogate, with the entire Unicode 3.2 range. Those fonts are also distributed e.g. by Ricoh (google for Andale Mono WT for more details).

Ascender Corp. offers Ascender Uni, a pan-Unicode monospaced font with the entire character set of Unicode 5.0. The design is similar to Andale Mono.

Microsoft offers Consolas, a great monospaced design by Luc(as) de Groot that covers extended Latin, Cyrillic and Greek that has regular, italic, bold and bold italic variants (Andale Mono and Ascender Uni only have one regular weight).

Regards,
A.

Spire's picture

This screenshot illustrates how AndaleMono and DejaVu look like in jEdit:
http://img299.imageshack.us/my.php?image=compareandaledejavulv8.png

That's not Andale Mono on the left. Looks like Courier New, actually.

twardoch's picture

Indeed, it is not Andale Mono on that image.

A.

Tomek Peszor's picture

Eeekh…
Of course, that’s not Andale Mono. It’s not Courier too.
That’s Dialog Input shipped with Java Developer’s Kit, which works best for me as far.
Unfortunately, it’s not a masterpiece of legiblilty.
But it replaces missing characters.

I have found some pan-Unicode fonts listed here:
http://en.wikipedia.org/wiki/Unicode_fonts

I have not tested Ascender Uni yet.

You can test your fonts on this source code:
http://taat.pl/typografia/typografia_wiele_jezykow.html

Let me know if it renders all characters correctly.

--
regards,
Tom

Si_Daniels's picture

>for coding purposes (a monospaced linear font), a pan-Unicode font would actually be useful.

Can you explain why? I can see fonts like this are useful for other purposes? But coding? Why?

Cheers, Si

Tomek Peszor's picture

When you get database files exported from multilingual forum,
you are not able to edit it,
because you are not able to distinguish characters…

But that’s not the point of this discussion :)
Should we change discussion topic to: “Should coding go for Unicode”?

--
regards,
Tom

Spire's picture

When you get database files exported from multilingual forum, you are not able to edit it, because you are not able to distinguish characters…

True, but that's not coding. HTML authoring (in your earlier example) is not coding either.

Coding is programming; that is, it's the process of writing executable computer instructions (i.e., code) in a computer programming language.

For what it's worth, I've been coding for over two decades, and I've never had the need for a "Unicode" font in my IDE. However, I do appreciate the desire for a pan-Unicode font for general-purpose use, which includes HTML authoring and database editing.

But let's not call that a "coding" font.

Si_Daniels's picture

Thanks Spire, that was what i was getting at. I wasn't trying to split hairs, just trying to work out if there's a business case/justification for us extending a monospaced font like Consolas beyond the confines of European writing systems.

Tomek Peszor's picture

Suppose, I write a PHP application for “HTML authoring”.
Then I have to insert comment:

// prints typography in ukrainian
echo 'Типографія';
// prints typography in japanese
echo 'タイポグラフィ';

You probably call this not a program, but a script.
But I might do the same in C++.

--
regards,
Tom

Spire's picture

PHP counts as coding, so your example certainly would benefit from having a pan-Unicode font that supports all the varied characters that you might want to use in your literal strings.

However, as Simon said earlier, it's generally a bad idea to hard-code literal strings in code. The preferred way to handle all language-specific strings is to define them in a separate resource file and then write code to select the appropriate strings to use depending on context. That way, the code itself is always straight ASCII, and internationalization becomes much easier.

Tomek Peszor's picture

Yes, for big projects this techniques are named i18n and L10n.
Then, you store all language strings in separate files, as in library.

But even then, when you want to take a look on that strings,
you have to switch between fonts if you don't have a pan-Unicode one.

Anyway, I still haven't found what I'm looking for :)

--
regards,
Tom

twardoch's picture

Tomek,

any problems with the links I have given?

Simon, Spire,

I don't understand why you're trying to prove Tomek that there's something wrong with his concept.

Much of modern "HTML authoring" is synonymous with "development of web applications" -- that of course *is* coding because it as it often includes AJAX/JavaScript on the client side plus PHP or Java on the server side.

Not all development projects involve "localization" or "internationalization". If you operate under the presumption that the default language is English and anything else is "international", this may be true. But just as easily, people develop web or desktop applications in a language that is not English, but they don't plan to localize or internationalize either. If I develop a website in Hebrew or Russian or Arabic or Japanese only, I simply want to type texts in that language in the source code. And if I'm writing and debugging a console application, or a Python script, I need a Unicode monospaced font, be it just for the console debugging output.

In fact, I have been using Andale Mono WT for quite some time for the purpose of developing Python scripts within FontLab Studio. The Output panel (a text-only console) works with monospaced fonts and I often need to output Unicode text to it.

I must admit that I find the notion that "code should be ASCII only" very antiquated. I've been using UTF-8 for my source code in Python for several years now.

A.

Spire's picture

Adam, I'm not trying to prove to Tomek that there's something wrong with his concept. In fact, I've repeatedly been saying that I appreciate his desire for a pan-Unicode font.

I've merely been trying to explain why there isn't already a wide range of pan-Unicode fonts available for use in coding: it's because most programmers have no use for non-ASCII characters in their actual source code.

You say that you find the notion that "code should be ASCII only" very antiquated. Conversely, I find the notion that code should contained hard-coded literal strings very antiquated. We are both right, when taken in different contexts.

Your earlier post was been very helpful -- to me as well. In particular, I hadn't heard of Ascender Uni, and in fact I'm pretty excited to learn of it. I've been wanting a good pan-Unicode monospaced font for a long time (albeit for another purpose).

Edward

twardoch's picture

Spire,

The concept that some plain-text files should use different encoding than others *is* antiquated, no matter what. In future, it only makes sense that plain-text files (even programming source code) are encoded using Unicode — even if a certain document format or programming language restricts the character set used for certain aspects, e.g. object names.

But whether string literals should be intermixed with programming code or externalized depends on the type of application. If the application is text-intense and only contains comparably little programming code (i.e. it is more like a document with elements of a program, e.g. a dynamic web page) *and* it will only be deployed in a single language, then I think externalizing strings makes little sense. It becomes useful only in scenarios where you have a large overhead of programming code over text, or when you're developing for i18n scenarios.

As you well know, XML can use any Unicode characters for element and attribute names. So you may use an XML document structure that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<authors>
<author type="main">
<firstname>Юрий</firstname>
<lastname>Ярмола</lastname>
</author>
<author>
<firstname>Adam</firstname>
<lastname>Twardoch</lastname>
</author>
</authors>

But others may prefer to define their document structure as follows:

<?xml version="1.0" encoding="UTF-8"?>
<авторы>
<автор тип="главный">
<имя>Юрий</имя>
<фамилия>Ярмола</фамилия>
</автор>
<автор>
<имя>Adam</имя>
<фамилия>Twardoch</фамилия>
</автор>
</авторы>

Both are valid XML structures, and it depends purely on the linguistic context which one will be chosen. The English language still holds some privileged position in electronic information processing, but this is changing.

A.

miha's picture

There is also a great benefit for non English programmers using Unicode, because you can program in your native language with all characters, not only the ASCII ones. Well, at least if compiler supports that:)

Spire's picture

Adam, I'm not sure what the argument is still about. I did acknowledge in my earlier post that you were correct. I think we each understand what the other is saying at this point.

In response to your last post, I'll just add that document encoding and actual character use are two separate issues. I never said that ASCII encoding was still a good thing; in fact, I definitely agree that it's antiquated and really needs to be phased out already. But even if the entire world magically switched overnight to, say, UTF-8-encoding for all text files (including all source code), I think most programmers would still have little use for non-ASCII characters in their actual code.

I think the whole reason this thread got sidetracked is that Simon and I were both surprised and taken aback by Tomek's categorical claim that "every coder... needs a fixed width font with Unicode support".

In any case, the bottom line is that pan-Unicode fonts are a Good Thing, regardless of what they might be used for.

Edward

ebensorkin's picture

I have a unicode programmers font I am developing. It has been called "software developer" but it's name is going to change. It supports all the Latin Glyphs including extended Latin A & B but it lacks Cyrillic, Greek, Japanese and Chinese. Maybe I will add these over time. Okay maybe not the Japanese; apart from maybe Hiragana & Katakana. And I almost certainly not the Chinese unless I get lots of help somehow.

Tomek, Adam, Si ( & whoever else cares to comment) what gyph coverage would you suggest is going to add the most utility beyond the Latin? And in what order?

Spire's picture

I'm very happy to hear that you're continuing to make progress on Software Developer!

My preference would be to have Cyrillic and Greek first, and then CJK.

ebensorkin's picture

Thanks!

I should be done by now but other projects come up & displace it...

I am going to get it out though! Probably as Latin at first and then with time I will make a 1.5 & 2.0 versions with additional glyphs for Cyrillic and Greek.

While I am asking questions I may as well ask what environment you work in & about what rendering scheme you work in.

Si_Daniels's picture

For Consolas we shipped with Latin, Greek and Cyrillic. we're expanding this to cover extended Latin, Poly Greek and Extended Cyrillic in the next update. We want to avoid adding Hebrew, Arabic and Thai if we possibly can, but that's the general direction we'd be pulled in for all core fonts.

With respect to CJK, these are predominantly fixed-pitch to start with, so rather than add these to a monster font, I'd be inclined to add good developer-centric Latin, Greek and Cyrillic coverage to good CHS, CHT, Korean and Japanese base fonts.

ebensorkin's picture

Si where should I look for a good Greek Cyrillic or Hebrew Mono as a reference?

Si_Daniels's picture

Hmm, I'd have to ask around.

Tomek Peszor's picture

Nice to hear, that while I was curing my flu,
discussion continued and all controversial points have been cleared out.

Thank you Adam for your tips.
I have Consolas installed with Vista. Looks clear and nice, but the range is too small.
I’ve been testing some time ago Andale Mono WT J and WT S, but the same thing as above.

Ascender Uni is very promising.

Eben,
maybe you’ll take a look at those glyphs for a reference?
http://www.fontmarketplace.com/font/ascender-uni.aspx/301

Speaking of ranges: I’d like to have all Slavic, Cyrillic, Greek, Hebrew, Arabic, simplified and extended Chineese, Korean and Japanese, Unicode ligatures and typographic symbols.
That’s a lot of work to do…

Now I use open source editor named jEdit, written in Java, on both, Linux and Windows. It works great on Mac too. The only font which displayed all the characters I needed was DialogInput so far.

--
regards,
Tom

Si_Daniels's picture

Have you considered asking the "jEdit" folks to support user-definable fonts for each Unicode range?

mcswell's picture

I realize this is a defunct discussion, but I feel compelled.

Defining coding as creating something that gets compiled into an executable is a rather narrow definition. As a computational linguist, I do things which I consider to be coding, but which do not get converted into executables per se. One of these things is to code finite state transducers for various human languages, using tools like the Xerox and Stuttgart Finite State Transducers (xfst and sfst respectively). I have also done coding in XML, where the XML is intended to be converted into the programming language of one of these FSTs. Once in awhile I've programmed things in Python (and co-workers have done so in Perl or even C) to modify the XML code for these FSTs or for dictionaries, and occasionally it's necessary to embed Unicode characters in these programs--not as strings to appear in dialog boxes or such like, but as parts of strings or regular expressions that the Python etc. code needs find in the XML and modify.

Apart from C, all these programming languages use interpreters.

The human languages we work with have included Bengali, Arabic and Urdu (Urdu uses the Arabic code range of Unicode). I wouldn't be surprised if we worked with other Indic languages in the future, and Ethiopic languages are not out of the question.

I regularly use jEdit or Visual Slickedit for writing these programs; some of my co-workers use emacs. We also use XMLmind, although that is mostly for DocBook-type XML, which I suppose you don't consider to be coding.

So I think I feel safe in calling what I and my co-workers do "coding", even though it does not always result in executables. Hard-coding literal strings is really the only option for these programs, especially for the FSTs where the whole point is to deal with dozens or even hundreds of such strings in the morphology and phonology of these languages. We certainly use programmers' editors, and at least jEdit does not "support user-definable fonts for each Unicode range". (I asked a couple months ago, and it's probably at the bottom of their priority list.)

Mike Maxwell

ebensorkin's picture

Eventually I will probably make my mono support Ethiopic but it will be another 2- 3 years I think. I am not sure about the others but maaaybe. Some glyph sets will not be good in mono no matter what you do. Urdu probably falls into that camp.

_vim's picture

I'm amazed how many people here are saying that a unicode font is unnecessary for "coding."

Some of us out here actually have to deal with multiple languages IN CODE (not HTML) on a daily basis. For example, parsing Asian text using a regex. Or Asian developers who might actually comment their code in their native languages -- and on and on. Just because _you_ haven't had to deal with this stuff doesn't mean that others don't -- more than a billion of us, in fact.

FYI, I came here looking for a better-looking fixed-width unicode font for gvim because I often deal with source code that includes Japanese for various reasons.

ebensorkin's picture

Surely there are some... didn't MS build one that's half decent? But what OS do you want it for? If it's Linux then you must be out of luck. No proper rendering engines!

Si_Daniels's picture

Many Japanese fonts are mostly fixed pitch, but no we've not made a "coding" font for Japanese. If there were a market I’m sure someone would have or already has.

ebensorkin's picture

So then I suppose that if you wanted to you could take a coding font and merge it with a MS Japanese font ( you would want to scale the latin to match the Japanese I think... ) - as long as it didn't violate the EULA. If it did you might be able to look on the Adobe side of the aisle. Now I wish I knew more about rendering for Japanese. It strikes me that the adobe fonts are probably less screen and more print oriented.. Maybe that's wrong as well. But what about local makers. When I was in Japan in the late 80's there no shortage ( seemingly ) of Japanese fonts in a variety of styles and especially ones which were screen oriented. But maybe they are NEC specific or something. Then again I wasn't a font nut yet. Maybe they are not great... I look forward to hearing more about this from somebody with more of an insider's view. Perhaps increasing screen rez will also make the problem less acute as well.

roger_pearse's picture

I stumbled on this discussion with much interest.

I'm currently trying to work with Jim Tauber's MorphGNT text file (a standard in Greek New Testament coding).

This file consists of all the words of the Greek New Testament, one per line, with the part of speech (noun/verb) and other grammatical info. At the end of the line, in polytonic Greek unicode characters, is the word, and the 'dictionary' form of that word.

I think he must be editing this file in Unix, because I can't find a unicode fixed width font that will display this file correctly on Windows. Courier New just gives square boxes against the characters that have accents over them.

Consolas doesn't do the job, sadly.

Is there any chance that it can be enhanced to support polytonic Greek characters?

Si_Daniels's picture

> Courier New

Are you using the 5.x version of the font? The one that comes with Vista. It should contain all of the pre-composed characters for Latin and Greek.

Cheers, Si

roger_pearse's picture

Thanks for the suggestion. I'm afraid that we're still on XP, but good to know that you're already on this problem.

roger_pearse's picture

Further to this, I posted about my experiences at my blog here. One of the comments suggested that a number of free fixed-width unicode fonts exist:

I was told:

I've not had a chance to try any of these, but any thoughts would be interesting.

roger_pearse's picture

DejaVu Sans Mono, FreeMono and Unifont did not work with SuperEdi, the text editor I was using. Everson Mono worked, but looked awful. It was about tolerable at 12pt.

Si_Daniels's picture

I don't think lurker is the right definition for me, but I've been called worse, so I'll let it slide.

ebensorkin's picture

Lurker.... indeed. It's not quite on the money.

Syndicate content Syndicate content