About Unicode and UTF - 8 Encoding
for Devanagari ( Marathi )
Unicode is a character set. UTF-8 is encoding.
The first 128 characters of Unicode (which correspond one-to-one with ASCII) are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well.
Devanagari Unicode block is from U+0900 To U+097F with total number of 128 characters (Hexadecimal values from 00 to 7F from 0900 to 097F )
HTML 4 supports UTF-8. HTML 5 supports both UTF-8 and UTF-16! Unicode U+0915 if written as क when the header of html page has declaration as !DOCTYPE html in the first header tags.
in html page will display as

UTF-8 can be represented in webpage by 
Unicode Code Point (from  U+0900  To U+097F) 
OR
UTF-8 in literal format 
(from  \xe0\xa4\x80 to \xe0\xa5\xbf)
OR
Numerical equivalent decimak values
( from  2304 to 2431)
As shown in my earlier blog the unicode characters for Marathi use single number from U+0900 To U+097F to express full character.
However, Marathi character is formed by adding vowel to consonant. In order to display only consonant we have to convert Marathi character by adding half character sign ( ् ) .
Thus we require two unicode characters in sequence to display consonant.
Normal method
Consonant + Vowel = Character
क् + अ = क
Unicode method
Character - क represented by

Consonant - क + ् = क् represented by

We can write all characters of बाराखडी (Barakhadi) in this form. But it is not needed as we can use only respective vowels to unicode character.
Normal method Unicode method
क् + आ = का क + ा = का
क् + इ = कि क + ि = कि
क् + ई = की क + ी = की
क् + उ = कु क + ु = कु
क् + ऊ =कू क + ू = कू
क् + ऋ = कृ क + ृ = कृ
क् + ए = के क + े = के
क् + ऐ = कै क + ै = कै
क् + ओ = कौ क + ौ = कौ
क् + अं = कं क + ं = कं
क् + अः = कः क + ः = कः
For writing complex characters, in Unicode method, the character is first converted to consonant and then other character is added.
क + ् + क = क्क

 
This method is used even if complex character is formed by two or more consonants.

 
 
No comments:
Post a Comment