Devanagari script - vowel letters

Belloc's picture

This Unicode document contains the following paragraph :

"Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 9-1 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used."

What do they mean by can be analyzed visually and sequence of code points resulting from analysis that should no be used ?

Belloc's picture

I think I've got the answer already : the paragraph says basically that, in case of the vowel आ (AA), one should use the Unicode code U+0906 instead of combining the characters अ (A) U+0905 with ा U+093E. For instance, if you try this last alternative in MS Word, the combination of the two characters is not obtained. They are just printed one after the other.

What I don't understand is why Unicode has to bother highlighting this. Isn't this an issue that should be handled by a specific font ?

Uli's picture

I suggest that you buy a book about Sanskrit and read the chapter about the Devanagari script. This will answer all your questions. There is no harm reading a book.

John Hudson's picture

Belloc: What I don't understand is why Unicode has to bother highlighting this. Isn't this an issue that should be handled by a specific font ?

No, because it is a text encoding issue, not a display issue. Unicode defines how text should be encoded.

Uli's picture

Let's assume you read this:

The Latin letter "w" is encoded atomically by Unicode, even if it can be analyzed visually as consisting of "v" + "v".

An Indian will answer:

"w" is an akhand letter unsplittable into any components (e.g. "v" + "v"), just like "आ" is an akhand letter unsplittable into any components.

Another example:

"&" is not split by Unicode into any components (e.g. "e" + "t"). "&" is "akhand", i.e. unsplittable. The same is true for "ई", which is not split into any components. Who learns the alphabet at school, learns all this.

Belloc's picture

John, Uli:

Thanks for both replies

Uli's picture

"... can be analyzed visually"

Quotation from www.unicode.org/versions/Unicode6.0.0/ch09.pdf
(PDF page 5, printed page 270)

Looking at "ई" and the repha hook of e.g. "Adobe Devanagari"

see http://www.sanskritweb.net/temporary/I0.jpg

it seems to me that modern Devanagari font designers and even the authors of the Unicode manual think that the hook above "ई" is identical with the repha hook; hence the expression "can be analyzed visually" and hence the identical design of the hook above "ई" and the repha hook of e.g. the Adobe Devanagari font.

However, looking at the historical development of the Devanagari script from the ancient Brahmi script to the modern Devanagari script, it seems apparent that e.g. the long vowel "ई" cannot be analysed visually as consisting of the short vowel "इ" plus the repha hook:

http://www.sanskritweb.net/temporary/I1.jpg
http://www.sanskritweb.net/temporary/I2.jpg
http://www.sanskritweb.net/temporary/I3.jpg

The scans were drawn from IndoSkript (http://userpage.fu-berlin.de/falk)

Belloc's picture

Uli

I didn't see the connection between your assertion that " it seems apparent that e.g. the long vowel "ई" cannot be analysed visually as consisting of the short vowel "इ" plus the repha hook" and the .jpg files.

Syndicate content Syndicate content