Character set for modern Hebrew

John Hudson's picture

Because most of my experience with Hebrew is with the Biblical text and related religious content, I remain somewhat uncertain about what constitutes a recommended character set—in terms of Unicode codepoints—for modern Hebrew. Some things are obvious, of course: the cantillation accents and signs such as the nun hafuch that occur only in the Bible text are clearly not needed. But there are some characters, related to vocalised text, that seem to me to exist in a grey area, due to their use in religious publications such as prayer books and perhaps, I wonder, elsewhere in restricted modern use?

A question for modern Hebrew readers and typographers: what characters would you expect to be included in a well made font for modern Hebrew, and which would you expect to be excluded?

JCSalomon's picture

 For, e.g., books, newspapers, etc., letters Alef through Tav, punctuation marks Geresh & Gershayim; possibly niqqud for some loan words. Children’s books need niqqud and Shin & Sin dots. Latin punctuation: commas, full-stops, etc. Not sure if there’s a standard for which set of quotation marks to use, though; I’ve seen both “high-high” and “low-high„—with the low quotes used as open quotes (on the right). Latin numerals.
 Siddurim, etc., need niqqud and punctuation marks Paseq & Sof Pasuq (for embedded Biblical verses). Some siddurim include cantillation for the Shema & Shirat HaYam, but nothing complicated.
 That what you had in mind?
—Joel

John Hudson's picture

Thanks, Joel. Three characters I am wondering about are technically part of the niqqud system, but their use seems limited:

U+05BA, HOLAM HASER FOR VAV (distinguishes vav+vowel 'VO' from vocal vav 'O')

U+05BD, METEG

U+05C7, QAMATS QATAN

William Berkson's picture

I'm no expert, but I would guess the dividing line on this is whether you want to include glyphs for the prayer book.

If not, I wouldn't think you'd need the meteg or qamatz qatan. I didn't know there was a special Holam Haser the VAV, so I can't say. You definitely need the regular Holam Haser.

Typograph's picture

Hi John
And I Wonder why they have unicode for Qamats Qatan and not for Sheva-Na and Dagesh Hazaq.

the word חוֹלָם IS holam Male
A word like מִצְוֹתָיו IS holam haser

in holam male we position the holm on top of the vav in the center where in holam haser we position the holm to the left of the VAV.

qamats qatan is not qamats rachav
Then qatan is O and the rachav Is A.
in a word like למדה Like LOMDA go like לָמְדָה But LAMDA is also לָמְדָה but with a meteg after the first qamats

William Berkson's picture

Typograph, isn't this just for religious texts? I think John's question is whether these refinements are necessary for a font that will be used just for modern Hebrew. My impression is that Israelis writing modern Hebrew only put in nikkud when there is some ambiguity that they will resolve. And then the refinements—qamats qatan, meteg, schva na, etc.—are not needed, right?

Does your code have to include specific words as exceptions, so that it is like a dictionary look up?

Typograph's picture

in modern hebrew qamats qatan sheva na & holam haser are not used.

my OT project subsitutes the glyphes mainly acording to hebrew grammer, but their are exceptions which are built in as a dictionary.

William Berkson's picture

>are exceptions which are built in as a dictionary.

Ah, that's what I was arguing here on Typophile with Israel S. would be necessary for this kind of project to work. Also there seem to be different views on when the sheva na is needed. For example do you say "schva" or "sheva" (schva nach or sheva na)? In modern Hebrew it is the first, right?

By the way, above in your "holam" and "mitzvotav" in Hebrew characters, the vav with the and the nikkud don't show in my typophile window (I'm on a Mac, Firefox). When I copied and pasted it into Text Edit, they showed up properly.

Typograph's picture

lookit, things are not so simple.
and is not for typophile.

if y wish you can mail me
studiofried@gmail.com
or call

02-9999325

I would gladly ansawr your questions.

my name is eli fried

paul d hunt's picture

*i'm listening*

david h's picture

John,

> HOLAM HASER FOR VAV

is that a new mark? you said something about Vav Haluma...

John Hudson's picture

HOLAM HASER FOR VAV (U+05BA) was added in Unicode 5.0, so not so new any more. This character enables an encoding and display distinction to be made between vav haluma (U+05D5 U+05BA) and holam male (U+05D5 U+05B9). For a full explanation and illustration, see page 16 of the SBL Hebrew font manual [PDF].

When this mark was encoded, it was decided to treat the existing HOLAM (U+05B9) character as holam male on the basis that this is much more common than vav haluma. So the new character performs the same role for vav as the existing holam performs for all the letters other than vav. A bit confusing, but the best approach from a backwards compatibility perspective.

david h's picture

Tiberian/Biblical Hebrew -- there is no Holam Haser; so are we talking about modern Hebrew?

John Hudson's picture

David, I'm not sure about the correct terminology -- what does 'haser' mean exactly? -- but there is a distinction in holam position for vav haluma and holam male in editions of the masoretic text, which I presume reflects the manuscript tradition. See the Genesis 4:13 example I cite in the SBL Hebrew manual:

I've checked this against Professor Dotan's edition of L, the BHK and BHS editions, and also the JPS Tanakh. I don't have a copy of the L facsimile, but presume that this distinction exists there also. [The JPS typesetting follows what I think of as a modern convention in which the holam sits much further to the left, between the letters rather than over the left side of the letter.]

If I recall correctly, there are about 300 occurrences of vav haluma in the Bible text, which would use the U+05BA mark, as contrasted with some 25,000 instances of holam male, which would use the U+05B9 mark.

What I'm wondering is whether U+05BA is something that should be included in fonts that support vocalised modern Hebrew. It seems to me a useful sort of thing to be able to distinguish two different pronunciations in terms of encoding and appearance, even if this character is not (yet) widely used.

david h's picture

John,

If the Holam Haser for Vav is based on Biblical Hebrew/the Tiberian system then the name is wrong.
In the Tiberian, Biblical Hebrew they didn't distinguish between Holam Male (1— Psalms 118,12) or Haser (2— Exodus 22,5)

The Tiberian vocalization system is known as a quality and not a quantity system. The main reason that the Tiberian vocalization gave us partial indication of relative vowel quantity is because the vowel quantity was not phonemic. The only indication of vowel quantity was by adding the sheva mark to the hatafim. That said, they were short. Vowel length in Tiberian Hebrew was based on the syllable structure!

BTW, because the Tiberian system was marking the qualitative distinctions between the vowels they didn't have any distinct graphic mark for the kamats katan!

As we know, they could not change the text; they could not add letters e.g. matres lectionis. They just added the nikkud!

Prof. Dotan, in one of his books that he published in 2005 stated very clearly that the vowel was pronounced the same whether there was a matres lectionis or not.
Same view by Prof. Blau that published his book in the 1970s.

In the treatise Hidayat Al-Qari (the correct reading of the Torah) in the description of the vowels, it's said about the positioning of the Holam: The holam is one dot placed above and between two letters.

The whole holam male, tsere male etc etc is a much later innovation, but definitely not by the Tiberian.

John Hudson's picture

It is entirely possible that the Unicode character name is wrong or misleading; you'd have to ask Mark Shoulson why that name was proposed. It doesn't really matter, I think, because although Unicode character names are immutable they're not functional. What matters is documentation that describes the use and purpose of a character.

Remember that Unicode encodes what is on the page, not what it signifies, so things like vowel quantity and quality aren't really relevant. There are two dots that occur above vav: one above the letter and one to the left of the letter, and this distinction is systematically applied in at least some texts, and needs to be encodable on computers and displayable with appropriate fonts.

The holam is one dot placed above and between two letters.

Yet this is not the case in L, nor in typographic editions such as Professor Dotan's that follow the manuscript convention: the holam is placed above the left side of the letter, not between the letters, excepting when holam male (positioned centered above vav or a little to the right side), or when followed by an unvocalised aleph (shifted to right side of aleph), or, obviously, when on lamed where the ascender obliges the dot to be positioned between the letters.

david h's picture

> What I'm wondering is whether U+05BA... even if this character is not (yet) widely used.

John,

We write (Modern Hebrew) — with or without vowel signs (ktiv male or haser):
עוון/עווֹן, עווֹנותי, מצוות/מצווֹת

John Hudson's picture

Yes, I know that Hebrew can be written with or without vowel signs. What I am wondering is whether, when writing with vowel signs, modern users might want to make a visual distinction between vav+holam as VO and vav+holam as O?

david h's picture

Meteg: Meteg is not part of the Modern Hebrew vocalization. We will not find it along with the other vowels in children's books, for example. However, we do find it in dictionaries. The only recent use of the meteg was in the book Hebrew in Jeans which was published several years ago. But this book is about Hebrew slang and serves as a dictionary. Grammar books of Modern Hebrew use other marks instead of meteg. In terms of what is missing in Hebrew fonts, and what I'm using/doing regularly, are these marks:


Vav + Holam: Historically and technically we can blame the early printers for positioning the Holam above the vav. In books where the subject matter is not Biblical we will not see the distinction of the positioning, and for the modern Hebrew speaker the pronunciation would be the same. Moreover, "male" or "haser" refers to the presence or absence of the vav, and does not indicate anything about the pronunciation. For example the word בוֹ and בֹ are the same.

Here are samples from two different books:



In sample 1 we can see this distinction. The subject matter is the Masorah notes, nikkud etc. and the scholar wanted to make this distinction. In sample 2 we can't find this distinction, though the subject matter is Biblical Hebrew. But again, the speaker of the Modern Hebrew won't pronounce it differently.
There is nothing wrong with having this mark, or if you want to make this distinction. But in Modern Hebrew in books and publications that require nikkud (and has nothing to do with Biblical Hebrew, Language & grammar in BH) we won't find this distinction.
Moreover, in the MSS we will find the patach genuva to the right. But in modern Hebrew we don't follow that positioning (e.g. children's books, young-adults' books with nikkud etc).

I don't know if we can compare between MSS and printed the Bible. The scribe, for example, added to the text hundreds of marks every day, and he was just a human being. So we might find some differences in the positioning even within the same page. In this regard L codex is not different from A codex.


Kamats katan: a siddur/ prayer book has a very specific audience which is based on specific tradition. The aim of the publisher when publishing a book for kids (with nikkud, modern Hebrew) is to reach as many kids as possible regardless of their tradition. But let's say that the editor wants to add a whole bunch of marks. The word צָהֳרַיִם can be pronounced tsohorayim or tsahorayim. That said, o vs. a sound. The same with the word מָחֳרָתַיִם — mohoratayim or mahoratayim. Again, kamats katan vs. kamats.
Should we add an extra page which will explain to the readers the differences?
When Dr. Shelomo Tal worked on the siddur Rinat Yisrael he introduced not just the kamats katan, but also marks to indicate milel and milra; he added commas and periods to help the readers when they are reading, and not to connect between words and creating new/wrong meaning to their prayers, new and clear divisions between paragraphs etc etc. What is less known is that he also published a book which could be regarded as 'behind the scenes' where he explained his decisions plus told the history of the prayer books.
The publishers and editors (mostly here in the US) are facing more acute problems with their siddurim (which Dr. Tal tried to solve back in the 1970s!).
Let's go back to the patach gnu'va to have something to think about... as we know the siddurim indicate it by positioning it to the right of the letter. For a person that goes to the synagogue pronounciation is very important. It is also important to pronounce the name of God e.g. אֱלוֹהַּ
We could ask as many people as possible how to pronounce it; almost 80% of the time (and sometimes more) we will hear אֱלוֹהַה. That said, doubling the letter he! But this is not the way to pronounce it. We call it patach gnuva because the letter he (in our sample) "stole" the patach (from the alef).
We need to say אֱלוֹאַ/אֱלוֹאַה, to hear the sound "a" very clearly, and the letter he is silent! So the mark is there. Why is there a problem to say it correctly? :)

Angus R Shamal's picture

As far as I know, modern Hebrew is based on "memory" and the reader recognizing the word and it's pronunciation without the help of the Nikud, and some of the vowels like Yod and Vav are silent or dropped in some cases (but those should be included in the alphabet). for the rest all the alphabet is the same, including the closing/end letters.

The Nikud is not there cause it's a Biblical Hybrew, it's there to make sure people pronounce/sing it correctly, so it's there when really needed in some cases.

also, there is a difference between classic alphabet and written/hand-written alphabet.. they look different.

my 2 cents

John Hudson's picture

Thanks for the very detailed response, David. Will I see you at TypeCon next week?

William Berkson's picture

On the meteg, in the new Koren Siddur, the milel words (accent on penultimate syllable instead of final) have the accented syllable marked with a meteg, something that is very helpful for those of us who are not native speakers of Hebrew. This is an innovative use of the meteg, I believe. This, again, is not modern Hebrew.

david h's picture

> Will I see you at TypeCon next week?

Yes, of course.

JCSalomon's picture

William,
 The use of the meteg to indicate milel words is useful, but did not originate with the new Koren Siddur. The ArtScroll siddur uses that method, as does the new Kol Yaakov; it may even be older.
—Joel

William Berkson's picture

Thanks for the correction Joel.

Michel Boyer's picture

My Even-Shoshan Hebrew dictionary in 4 volumes from 1988 uses the meteg for that purpose.

raphaelfreeman's picture

if you want a set that is useful for siddurim (and by the way this is a HUGE market), then you need:

  • kamatz katan
  • chataf kamatz katan
  • shva na
  • shva nach (yes this is different from a regular shva)
  • shva merachef
  • dagesh chazak
  • dagesh kal
  • cholam chaser
  • the "meteg" of the sof pasuk (usually looks like a meteg)
  • meteg
  • gaaya (usually looks like a meteg)
  • mileil (so in Koren this glyph might look like a meteg below the letter, but in Rinat it will an ole above the letter)
  • milra
  • the two diamonds of the sof pasuk (as opposed to a colon)
  • patach ganuv
  • in general the mistakes in the taamey mikra (if I recall there are 3) need to be fixed. It seems that there was a misunderstanding that the taamim of the 21 books are different from then the "sifrey emet", ie Job, Proverbs and Psalms. I'm not really an expert in this area, but if necessary I can put you in contact with a number of people who can explain it, although there is an excellent book in Hebrew by Rabbi Broyer z"l called "Taamey Mikra".

    quadibloc's picture

    It's interesting to me that the responses in this thread were more helpful because they were not based on the understanding I would have had of the initial question.

    What is the recommended character set for a font used with modern Hebrew? I would have thought the answer to that question could be obtained trivially: look at the character set of nearly any Hebrew font, since most fonts won't include the extra typographical refinements needed for Scriptural text.

    But the responses here dealt with the "ideal" character set instead of the "typical" one, thereby pointing out that vowel points are used with modern Hebrew texts for children's books and in the pronunciation keys of dictionaries.

    So far, the advice you're getting here seems to be that about the only things you can safely omit are cantillation marks - and the extra-wide versions of the letters used to allow vowel points to be applied to a "missing" letter while not putting that letter in the body of a Scriptural text! And some of the things you can't omit aren't even in Unicode yet.

    david h's picture

    > But the responses here dealt with the "ideal" character set instead of the "typical" one...

    What is the recommended character set for a font used with English/Arabic/Amharic?
    :)

    quadibloc's picture

    Amharic is tricky; I've encountered a web site that notes that most Amharic fonts have been dumbed down to correspond with a drastic simplification of that script so that it can work with a typewriter.

    For Hebrew, though, I would have been inclined to simply point out Code Page 1255:

    http://msdn.microsoft.com/en-us/goglobal/cc305148.aspx

    as representing the character set that would actually be available to users of a Hebrew font unless they were using specialized software that accessed extra characters. (Of course, you wouldn't want to exclude anything available in Macintosh Hebrew - but I haven't found a reference for that.)

    Of course, one would still have to apply a knowledge of Hebrew to that chart, to realize that one would want, as glyphs, shin plus shin dot and shin plus sin dot, rather than trying to compose the characters - although vowel points, which are included on that code page, would normally be added by composing characters because of the number of possible combinations. (In Amharic, of course, distinctive glyphs are inescapable.)

    gohebrew's picture

    Rephael listed the Hebrew grammar glyphs used in Modern Hebrew.

    They are:

    kamatz katan
    chataf kamatz katan
    shva na
    shva nach (yes this is different from a regular shva)
    shva merachef
    dagesh chazak
    dagesh kal
    cholam chaser
    the "meteg" of the sof pasuk (usually looks like a meteg)
    meteg
    gaaya (usually looks like a meteg)
    mileil (so in Koren this glyph might look like a meteg below the letter, but in Rinat it will an ole above the letter)
    milra
    the two diamonds of the sof pasuk (as opposed to a colon)
    patach ganuv

    As I understand, there is no differences between Biblical Hebrew, post-Biblical Hebrew, such as Mishnaic Hebrew, and between Modern Hebrew.

    Hebrew Grammar is studied daily in Israel, and most Hebrew schools in the diaspora. Having a group of fonts supporting these glyphs will certainly be used regularly, and promote the increased knowledge of Hebrew Grammar.

    I agree with John Hudson that the teaching of the rules of Hebrew Grammar is outside the realm of Unicode. But the glyphs used to describe Hebrew Grammar are within the scope of Unicode, and should be implemented as soon as possible.

    Syndicate content Syndicate content