Devanagari script - vowel letters

Primary tabs

8 posts / 0 new
Last post
João Afonso Belloc's picture
Joined: 12 Apr 2011 - 6:06am
Devanagari script - vowel letters
0

This Unicode document contains the following paragraph :

"Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 9-1 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used."

What do they mean by can be analyzed visually and sequence of code points resulting from analysis that should no be used ?

Tags: 
João Afonso Belloc's picture
Joined: 12 Apr 2011 - 6:06am
0

I think I've got the answer already : the paragraph says basically that, in case of the vowel आ (AA), one should use the Unicode code U+0906 instead of combining the characters अ (A) U+0905 with ा U+093E. For instance, if you try this last alternative in MS Word, the combination of the two characters is not obtained. They are just printed one after the other.

What I don't understand is why Unicode has to bother highlighting this. Isn't this an issue that should be handled by a specific font ?

Uli Stiehl's picture
Offline
Joined: 1 Feb 2006 - 8:02am
0

I suggest that you buy a book about Sanskrit and read the chapter about the Devanagari script. This will answer all your questions. There is no harm reading a book.

John Hudson's picture
Offline
Joined: 21 Dec 2002 - 11:00am
0

Belloc: What I don't understand is why Unicode has to bother highlighting this. Isn't this an issue that should be handled by a specific font ?

No, because it is a text encoding issue, not a display issue. Unicode defines how text should be encoded.

Uli Stiehl's picture
Offline
Joined: 1 Feb 2006 - 8:02am
0

Let's assume you read this:

The Latin letter "w" is encoded atomically by Unicode, even if it can be analyzed visually as consisting of "v" + "v".

An Indian will answer:

"w" is an akhand letter unsplittable into any components (e.g. "v" + "v"), just like "आ" is an akhand letter unsplittable into any components.

Another example:

"&" is not split by Unicode into any components (e.g. "e" + "t"). "&" is "akhand", i.e. unsplittable. The same is true for "ई", which is not split into any components. Who learns the alphabet at school, learns all this.

João Afonso Belloc's picture
Joined: 12 Apr 2011 - 6:06am
0

John, Uli:

Thanks for both replies

Uli Stiehl's picture
Offline
Joined: 1 Feb 2006 - 8:02am
0

"... can be analyzed visually"

Quotation from www.unicode.org/versions/Unicode6.0.0/ch09.pdf
(PDF page 5, printed page 270)

Looking at "ई" and the repha hook of e.g. "Adobe Devanagari"

see http://www.sanskritweb.net/temporary/I0.jpg

it seems to me that modern Devanagari font designers and even the authors of the Unicode manual think that the hook above "ई" is identical with the repha hook; hence the expression "can be analyzed visually" and hence the identical design of the hook above "ई" and the repha hook of e.g. the Adobe Devanagari font.

However, looking at the historical development of the Devanagari script from the ancient Brahmi script to the modern Devanagari script, it seems apparent that e.g. the long vowel "ई" cannot be analysed visually as consisting of the short vowel "इ" plus the repha hook:

http://www.sanskritweb.net/temporary/I1.jpg
http://www.sanskritweb.net/temporary/I2.jpg
http://www.sanskritweb.net/temporary/I3.jpg

The scans were drawn from IndoSkript (http://userpage.fu-berlin.de/falk)

João Afonso Belloc's picture
Joined: 12 Apr 2011 - 6:06am
0

Uli

I didn't see the connection between your assertion that " it seems apparent that e.g. the long vowel "ई" cannot be analysed visually as consisting of the short vowel "इ" plus the repha hook" and the .jpg files.