IDN Policy  – LK Domain Registry

1 Introduction

This document sets out the policy for IDN (Internationalized Domain Names) registrations under.ලංකා and .இலங்கை at LK Domain Registry. This policy and procedure are designed to ensure reliable and reasonable assignments of IDN for the registrants.

2 Abbreviations

Character: Character can be either a vowel, or a consonant or a composite ( a consonant with a vowel modifier) in Sinhala/Tamil script, or a digit/number or a Latin letter or hyphen(-)

Domain name: Domain name is unique address that can be used identify a resource on the Internet. It may consist one or more domain labels.

Domain Label: Domain label is a string which is bounded by period(s) “.”

Ex: nic.lk in this Domain name “nic” is a domain label

3. Policy

3.1. General Policy

3.1.1.      The registration is made on first come first serve basis.

3.1.2.      The domain labels should not be offensive to any accepted race, region, culture or tradition of Sri Lanka.

3.1.3.      The domain should not indicate a name of a country or government.(without specific authorization )

3.1.4.      The domain name should follow the DNS label rules defined in (RFC 4343).

  • A DNS label should not exceed 63 bytes.
  • A domain name should not start or end with a hyphen (-) and should not contain consecutive hyphens.
  • A domain name can be up to 255 characters

3.2. IDN Policy

3.2.1.       Registrants can request IDN domains under.ලංකා and .இலங்கை in LK Domain Registry.

3.2.2.      The requested domain name should consists only the characters given in the relevant IDN language table [Appendix-A]. The relevant language table for. ලංකා and . இலங்கை are attached in the Appendix A.

E.g.: If you are registering a   .ලංකා domain name, then you should use characters which are included in the.ලංකා language table.

3.2.3.      The domain string must contain at least two letters.

3.2.4.      The string should consist valid Unicode code points and should comply with the linguistic rules of the respective language. However it does not need to comply with the spelling rules.

E.g.:

ෙඅ is not a valid string.

නය and ණය are valid strings and which are considered as two different strings.

3.2.5.      The String should not contain the pattern “xn- -”

3.2.6.      It you are requesting a domain name containing zwj(200D) we are registering two domain names. They are domain name which contains zwj and a domain name without zwj as a budle.

  • Even if we register two domain names the domain registration charges will not be affected.

E.g.:

If you are requesting a .ලංකා domain, containing ් + ර or ්+ය we are registering a extra domain name contains rakaransaya(්‍ර) or yansaya(්‍ය) with the requesting domain accordingly.

E.g.:  If you are requesting සත්‍ය.ලංකා we are registering සත්ය.ලංකා domain as well.

සත්‍ය.ලංකා        සත්ය.ලංකා

3.3. Policies - During the Sunrise period

3.3.1.       During the sunrise period, .lk domain registrant(s) can request the relevant IDN domain(s) to the existing .lk domain(s) through on-line request form available in LK Domain registry official website.

3.3.2.      The request domain name should be derived from the existing .lk domain name by transliteration or pronunciation.

3.3.3.      Only one Sinhala and Tamil domain name can be registered for each .lk registration even if multiple transliterations are possible.

4. IDN Label Rules

This set of rules (Appendix-B) guide you to create a valid domain name for.ලංකා and. இலங்கை domains.

5. Appendix

5.1. Appendix-A

5.1.1. Permitted String Table for. ලංකා domains

Following tabel specifies the IDN (Internationalized Domain Names) Language Table used by the LK Domain Registry for the registration of Sinhala language domain labels in the .lk and .ලංකා domains. These are based on the recommendation of the ICTA IDN working group.

Other restrictions on the allowable character sequences exist, which are not documented in this table.

 

 

 

 

Latin

 

 

U+002D

HYPHEN-MINUS

-

U+0030..U+0039

DIGIT ZERO - DIGIT NINE

0-9

U+0061..U+007A

LATIN SMALL LETTER A - LATIN SMALL LETTER Z

A-Z

a-z

Sinhala

 

 

U+0D82

Sinhala sign anusvaraya

(ං)

U+0D83

Sinhala sign visargaya

(ඃ)

U+0D85

Sinhala letter ayanna

(අ)

U+0D86

Sinhala letter aayanna

(ආ)

U+0D87

Sinhala letter aeyanna

(ඇ)

U+0D88

Sinhala letter aeeyanna

(ඈ)

U+0D89

Sinhala letter iyanna

(ඉ)

U+0D8A

Sinhala letter iiyanna

(ඊ)

U+0D8B

Sinhala letter uyanna

(උ)

U+0D8C

Sinhala letter uuyanna

(ඌ)

U+0D8D

Sinhala letter iruyanna

(ඍ)

U+0D8E

Sinhala letter iruuyanna

(ඎ)

U+0D91

Sinhala letter eyanna

(එ)

U+0D92

Sinhala letter eeyanna

(ඒ)

U+0D93

Sinhala letter aiyanna

(ඓ)

U+0D94

Sinhala letter oyanna

(ඔ)

U+0D95

Sinhala letter ooyanna

(ඕ)

U+0D96

Sinhala letter auyanna

(ඖ)

U+0D9A

Sinhala letter alpapraana kayanna

(ක)

U+0D9B

Sinhala letter mahaapraana kayanna

(ඛ)

U+0D9C

Sinhala letter alpapapraana gayanna

(ග)

U+0D9D

Sinhala letter mahaapraana gayanna

(ඝ)

U+0D9E

Sinhala letter kantaja naasikyaya

(ඞ)

U+0D9F

Sinhala letter sanyaka gayanna

(ඟ)

U+0DA0

Sinhala letter alpapraana cayanna

(ච)

U+0DA1

Sinhala letter mahaapraana cayanna

(ඡ)

U+0DA2

Sinhala letter alpapraana jayanna

(ජ)

U+0DA3

Sinhala letter mahaapraana jayanna

(ඣ)

U+0DA4

Sinhala letter taaluja naasikyaya

(ඤ)

U+0DA5

Sinhala letter taaluja sanyooga naaksikyaya

(ඥ)

U+0DA7

Sinhala letter alpapraana ttayanna

(ට)

U+0DA8

Sinhala letter mahaapraana ttayanna

(ඨ)

U+0DA9

Sinhala letter alpapraana ddayanna

(ඩ)

U+0DAA

Sinhala letter mahaapraana ddayanna

(ඪ)

U+0DAB

Sinhala letter muurdhaja nayanna

(ණ)

U+0DAD

Sinhala letter alpapraana tayanna

(ත)

U+0DAE

Sinhala letter mahaapraana tayanna

(ථ)

U+0DAF

Sinhala letter alpapraana dayanna

(ද)

U+0DB0

Sinhala letter mahaapraana dayanna

(ධ)

U+0DB1

Sinhala letter dantaja nayanna

(න)

U+0DB3

Sinhala letter sanyaka dayanna

(ඳ)

U+0DB4

Sinhala letter alpapraana payanna

(ප)

U+0DB5

Sinhala letter mahaapraana payanna

(ඵ)

U+0DB6

Sinhala letter alpapraana bayanna

(බ)

U+0DB7

Sinhala letter mahaapraana bayanna

(භ)

U+0DB8

Sinhala letter mayanna

(ම)

U+0DB9

Sinhala letter amba bayanna

(ඹ)

U+0DBA

Sinhala letter yayanna

(ය)

U+0DBB

Sinhaya letter rayanna

(ර)

U+0DBD

Sinhala letter dantaja layanna

(ල)

U+0DC1

Sinhala letter taaluja sayanna

(ශ)

U+0DC2

Sinhala letter muurdhaja sayanna

(ෂ)

U+0DC3

Sinhala letter dantaja sayanna

(ස)

U+0DC4

Sinhala letter hayanna

(හ)

U+0DC5

Sinhala letter muurdhaja layanna

(ළ)

U+0DC6

Sinhala letter fayanna

(ෆ)

U+0DCA

Sinhala sign al-lakuna

(්)

U+0DCF

Sinhala vowel sign aela-pilla

(ා)

U+0DD0

Sinhala vowel sign ketti aeda-pilla

(ැ)

U+0DD1

Sinhala vowel sign diga aeda-pilla

(ෑ)

U+0DD2

Sinhala vowel sign ketti is-pilla

(ි)

U+0DD3

Sinhala vowel sign diga is-pilla

(ී)

U+0DD4

Sinhala vowel sign ketti paa-pilla

(ු)

U+0DD6

Sinhala vowel sign diga paa-pilla

(ූ)

U+0DD8

Sinhala vowel sign gaetta-pilla

(ෘ)

U+0DD9

Sinhala vowel sign kombuva

(ෙ)

U+0DDA

Sinhala vowel sign diga kombuva

(ේ)

U+0DDB

Sinhala vowel sign kombu deka

(ෛ)

U+0DDC

Sinhala vowel sign kombuva haa aela-pilla

(ො)

U+0DDD

Sinhala vowel sign kombuva haa diga aela-pilla

(ෝ)

U+0DDE

Sinhala vowel sign kombuva haa gayanukitta

(ෞ)

U+0DF2

Sinhala vowel sign diga gaetta-pilla

(ෲ)

 

5.1.2. Permitted String Table for .இலங்கை domains

This document specifies the IDN (Internationalized Domain Names) Language Table used by the LK Domain Registry for the registration of Tamil language labels in the .lk and .இலங்கை domains. These are based on the recommendation of the ICTA IDN working group.

 

Latin

 

 

U+002D

HYPHEN-MINUS

-

U+0030..U+0039

DIGIT ZERO - DIGIT NINE

0-9

 

LATIN SMALL LETTER A - LATIN SMALL LETTER Z

A-Z

a-z

Tamil

 

 

U+0B83

TAMIL SIGN VISARGA = aytham

U+0B85

TAMIL LETTER A

U+0B86

TAMIL LETTER AA

U+0B87

TAMIL LETTER I

U+0B88

TAMIL LETTER II

U+0B89

TAMIL LETTER U

U+0B8A

TAMIL LETTER UU

U+0B8E

TAMIL LETTER E

U+0B8F

TAMIL LETTER EE

U+0B90

TAMIL LETTER AI

U+0B92

TAMIL LETTER O

U+0B93

TAMIL LETTER OO

U+0B94

TAMIL LETTER AU

U+0B95

TAMIL LETTER KA

 

U+0B99

TAMIL LETTER NGA

 

U+0B9A

TAMIL LETTER CA

 

U+0B9C

TAMIL LETTER JA

 

U+0B9E

TAMIL LETTER NYA

 

U+0B9F

TAMIL LETTER TTA

 

U+0BA3

TAMIL LETTER NNA

 

U+0BA4

TAMIL LETTER TA

 

U+0BA8

TAMIL LETTER NA

 

U+0BA9

 

TAMIL LETTER NNNA

 

U+0BAA

TAMIL LETTER PA

 

U+0BAE

TAMIL LETTER MA

 

U+0BAF

TAMIL LETTER YA

 

U+0BB0

TAMIL LETTER RA

 

U+0BB1

TAMIL LETTER RRA

 

U+0BB2

TAMIL LETTER LA

 

U+0BB3

TAMIL LETTER LLA

 

U+0BB4

TAMIL LETTER LLLA

 

U+0BB5

TAMIL LETTER VA

 

U+0BB6

TAMIL LETTER SHA

 

U+0BB7

TAMIL LETTER SSA

 

U+0BB8

TAMIL LETTER SA

 

U+0BB9

TAMIL LETTER HA

 

U+0BBE

TAMIL VOWEL SIGN AA

 

U+0BBF

 

TAMIL VOWEL SIGN I

ி

 

U+0BC0

TAMIL VOWEL SIGN II

 

U+0BC1

TAMIL VOWEL SIGN U

 

U+0BC2

TAMIL VOWEL SIGN UU

 

U+0BC6

TAMIL VOWEL SIGN E

 

U+0BC7

TAMIL VOWEL SIGN EE

 

U+0BC8

TAMIL VOWEL SIGN AI

 

U+0BCA

TAMIL VOWEL SIGN O

 

U+0BCB

TAMIL VOWEL SIGN OO

 

U+0BCC

TAMIL VOWEL SIGN AU

 

U+0BCD

 

TAMIL SIGN VIRAMA

 

5.2. Appendix-B

5.2.1. IDN Label Rules for .ලංකා domains

 

  • IDN rules for Indic scripts are based on strings rather than individual Unicode characters
  • as Indic letters (akshara) are represented by strings of Unicode characters.
  • we define the sets (consonants, vowels, modifiers, semi consonants, zwj etc.) to which we group the letters

SinhalaVowel = [

Sinhala_Letter_A = U+0D85 # (අ)

Sinhala_Letter_AA = U+0D86 # (ආ)

Sinhala_Letter_AE = U+0D87 # (ඇ)

Sinhala_Letter_AEE = U+0D88 # (ඈ)

Sinhala_Letter_I = U+0D89 # (ඉ)

Sinhala_Letter_II = U+0D8A # (ඊ)

Sinhala_Letter_U = U+0D8B# (උ)

Sinhala_Letter_UU = U+0D8C # (ඌ)

Sinhala_Letter_vR= U+0D8D # (ඍ)

Sinhala_Letter_vRR= U+0D8E # (ඎ)

Sinhala_Letter_E = U+0D91 # (එ)

Sinhala_Letter_EE = U+0D92 # (ඒ)

Sinhala_Letter_AI= U+0D93 # (ඓ)

Sinhala_Letter_O= U+0D94 # (ඔ)

Sinhala_Letter_OO = U+0D95 # (ඕ)

Sinhala_Letter_AU = U+0D96 # (ඖ)

]

 

SinhalaConsonant = [

Sinhala_Letter_KHA = U+0D9A # (ක)

Sinhala_Letter_GA= U+0D9B # (ඛ)

Sinhala_Letter_GHA = U+0D9C # (ග)

Sinhala_Letter_NGA = U+0D9D # (ඝ)

Sinhala_Letter_NGGA = U+0D9E # (ඞ)

Sinhala_Letter_CA = U+0D9F # (ඟ)

Sinhala_Letter_CHA = U+0DA0 # (ච)

Sinhala_Letter_JA= U+0DA1 # (ඡ)

Sinhala_Letter_JHA = U+0DA2 # (ජ)

Sinhala_Letter_NYA = U+0DA3 # (ඣ)

Sinhala_Letter_JNYA= U+0DA4 # (ඤ)

Sinhala_Letter_NYJA= U+0DA5 # (ඥ)

Sinhala_Letter_NYJA = U+0DA6 # (ඦ)

Sinhala_Letter_TTA = U+0DA7 # (ට)

Sinhala_Letter_TTHA= U+0DA8 # (ඨ)

Sinhala_Letter_DDA = U+0DA9 # (ඩ)

Sinhala_Letter_DDHA = U+0DAA # (ඪ)

Sinhala_Letter_NNA= U+0DAB # (ණ)

Sinhala_Letter_NNDDA = U+0DAC # (ඬ)

Sinhala_Letter_TA = U+0DAD # (ත)

Sinhala_Letter_THA = U+0DAE # (ථ)

Sinhala_Letter_DA = U+0DAF # (ද)

Sinhala_Letter_DHA = U+0DB0# (ධ)

Sinhala_Letter_NA= U+0DB1# (න)

Sinhala_Letter_NDA = U+0DB3# (ඳ)

Sinhala_Letter_PA= U+0DB4 # (ප)

Sinhala_Letter_PHA = U+0DB5 # (ඵ)

Sinhala_Letter_BA= U+0DB6 # (බ)

Sinhala_Letter_BHA = U+0DB7 # (භ)

Sinhala_Letter_MA= U+0DB8 # (ම)

Sinhala_Letter_MBA= U+0DB9# (ඹ)

Sinhala_Letter_YA = U+0DBA # (ය)

Sinhala_Letter_RA = U+0DBB # (ර)

Sinhala_Letter_LA = U+0DBD # (ල)

Sinhala_Letter_VA = U+0DC0 # (ව)

Sinhala_Letter_SHA = U+0DC1 # (ශ)

Sinhala_Letter_SSA= U+0DC2 # (ෂ)

Sinhala_Letter_SA= U+0DC3 # (ස)

Sinhala_Letter_HA = U+0DC4 # (හ)

Sinhala_Letter_LLA = U+0DC5 # (ළ)

Sinhala_Letter_FA= U+0DC6 # (ෆ)

]

 

SinhalaModifiers=[

 

Sinhala_Vowel_Sign_AA= U+0DCF # (ා)

Sinhala_Vowel_Sign_AE = U+0DD0# (ැ)

Sinhala_Vowel_Sign_AEE= U+0DD1# (ෑ)

Sinhala_Vowel_Sign_I= U+0DD2# (ි)

Sinhala_Vowel_Sign_II= U+0DD3# (ී)

Sinhala_Vowel_Sign_U= U+0DD4# (ු)

Sinhala_Vowel_Sign_UU= U+0DD6# (ූ)

Sinhala_Vowel_Sign_VR= U+0DD8# (ෘ)

Sinhala_Vowel_Sign_VRR= U+0DF2# (ෲ)

Sinhala_Vowel_Sign_E= U+0DD9# (ෙ)

Sinhala_Vowel_Sign_EE= U+0DDA # (ේ)

Sinhala_Vowel_Sign_AI= U+0DDB # (ෛ)

Sinhala_Vowel_Sign_VI= U+0DDF # (ෟ)

Sinhala_Vowel_Sign_O= U+0DDC # (ො)

Sinhala_Vowel_Sign_OO= U+0DDD # (ෝ)

Sinhala_Vowel_Sign_AU= U+0DDE # (ෞ)

Sinhala_Sign_ALLAKUNA= U+0DCA # (්)

]

 

SinhalaSemiConsonants=[

Sinhala_Sign_Anusvaraya= U+0D82 # (ං)

Sinhala_Sign_Visargaya= U+0D83 # (ඃ)

]

 

ZWJ= [

ZWJ =U+200D #(zwj)

]

English_Letters=[A-Z or a-z]

 

Digits=[0 to 9]

  • Rules

# Rules have the following format:

# <sequence>:<result>

# Key:

# <sequence> is the sequence of characters starting from the current position in the label where each element is either a named character or a member of a character set defined above.

# <result>  is either "fail" or "next"

# Logically, a label is processed by iterating through its character positions

# In each iteration, each rule is checked with the substring starting from the current character position.

# If the current substring matches then the result is applied as follows:

#   fail: stop, the label is invalid

#   next: move to the next character position

# If the processing reaches the end of the string, then the label is valid.

#

# Variants:

# A variant is defined by a rule of the form

# <sequence1> | <sequence2> : variant

# If the current substring matches either <sequence1> or <sequence2>, then note that

#    the label contains a variant, and then move to the next character position.

# Rule can be defined as follows.

 

1. First letter can be a vowel a consonant a digit or a English letter

EX: Sinhala_Letter_A(0D85). . . . Sinhala_Letter_AU(0D96)

Sinhala_Letter_KHA(0D9A) ... Sinhala_Letter_FA(0DC6)

English_Letter (A – Z or a - z)

Digits (0 to 9)

 

2. A vowel can follow another vowel, consonant, a semi consonant, English letter or a digit

Ex:

Sinhala_Letter_A  Sinhala_Letter_AA (අආ)

Sinhala_Letter_I Sinhala_Letter_RA (ඉර)

Sinhala_Letter_A Sinhala_Sign_Anusvaraya(අං),
Sinhala_Letter_A Sinhala_Sign_Visargaya(අඃ)

Sinhala_Letter_A English_Letter_C (අc)

Sinhala_Letter_A 1 (අ1)

 

3. A consonant can follow another consonant, modifier, vowel, al-lakuna a semi consonant, digit or an English letter

Ex:

Sinhala_Letter_GHA Sinhala_Letter_MA(ග ම)
Sinhala_Letter_GHA Sinhala_Vowel_Sign_AA Sinhala_Letter_LA Sinhala_Sign_ALLAKUNA Sinhala_Letter_LA (ගාල්ල)

Sinhala_Letter_KHA Sinhala_Sign_ALLAKUNA Sinhala_Letter_LA Sinhala_Vowel_Sign_I Sinhala_Letter_FA Sinhala_Letter_DDA Sinhala_Sign_ALLAKUNA (ක්ලිෆඩ්)

Sinhala_Letter_NA Sinhala_Sign_Anusvaraya Sinhala_Letter_GHA Sinhala_Vowel_Sign_II (නංගී)

Sinhala_Letter_GHA 1 (ග1)
Sinhala_Letter_GHA m(ගm)

 

4. A digit/ a English letter can follow a vowel, a consonant or digit/a English letter

Ex:

English_Letter_A Sinhala_Letter_I Sinhala_Letter_RA (Aඉර)

English_Letter_B Sinhala_Letter_GHA Sinhala_Letter_MA(Bග ම)

English_Letter_A English_Letter_B (AB)

 

5. A semi consonant can follow a vowel, a consonant, digit/or an English letter

Ex:

Sinhala_Letter_A Sinhala_Sign_Visargaya Sinhala_Letter_RA(අඃර)

Sinhala_Letter_KHA Sinhala_Sign_Anusvaraya Sinhala_Letter_vR(කංඍ)

Sinhala_Letter_KHA Sinhala_Sign_Anusvaraya 1 (කං1)
Sinhala_Letter_KHA Sinhala_Sign_Anusvaraya English_Letter_a (කංa)

 

6. A modifier can follow a semi consonant, vowel, consonant, digit or an English letter

Ex:

Sinhala_Letter_KHA Sinhala_Vowel_Sign_II Sinhala_Sign_Anusvaraya (කීං)

Sinhala_Letter_NA Sinhala_Vowel_Sign_AA Sinhala_Letter_U Sinhala_Letter_LA (නාඋල)

Sinhala_Letter_NA Sinhala_Vowel_Sign_AA Sinhala_Letter_GHA Sinhala_Letter_SA (නාගස)

Sinhala_Letter_NA 1(නා1)

 

7. Sinhala_Sign_ALLAKUNA can follow vowel, consonant, zwj,digit or an English letter

Ex:

Sinhala_Letter_GHA Sinhala_Letter_LA Sinhala_Sign_ALLAKUNA Sinhala_Letter_A Sinhala_Letter_MA Sinhala_Vowel_Sign_U Sinhala_Letter_NNA(ගල්අමුණ)

Sinhala_Letter_A Sinhala_Letter_TA Sinhala_Sign_ALLAKUNA Sinhala_Letter_LA(අත්ල)

Sinhala_Letter_KHA Sinhala_Sign_ALLAKUNA 200D Sinhala_Letter_RA(ක + ් + zwj + ර) = ක්‍ර

Sinhala_Letter_BA Sinhala_Letter_SA Sinhala_Sign_ALLAKUNA 1 (බස්1)

 

8. After a zwj Sinhala_Letter_YA, Sinhala_Letter_RA can be followed.

Ex:

ක්‍ර = Sinhala_Letter_KHA Sinhala_Sign_ALLAKUNA zwj(200D) Sinhala_Letter_RA (ක + ් + zwj + ර)

ක්‍ය = Sinhala_Letter_KHA Sinhala_Sign_ALLAKUNA zwj(200D) Sinhala_Letter_YA (ක + J + zwj + ය)

 

5.2.2. IDN Label Rules for .இலங்கை domains

 

  • IDN rules for Indic scripts are based on strings rather than individual Unicode characters
  • as Indic letters (akshara) are represented by strings of Unicode characters.
  • we define the sets (consonants, vowels, vowel signs, etc.) to which we group the letters

TamilVowel = [

Tamil_Letter_A

Tamil_Letter_AA

Tamil_Letter_I

Tamil_Letter_II

Tamil_Letter_U

Tamil_Letter_UU

Tamil_Letter_E

Tamil_Letter_EE

Tamil_Letter_AI

Tamil_Letter_O

Tamil_Letter_OO

Tamil_Letter_AU

]

TamilConsonant = [

Tamil_Letter_KA

Tamil_Letter_NGA

Tamil_Letter_CA

Tamil_Letter_JA

Tamil_Letter_NYA

Tamil_Letter_TTA

Tamil_Letter_NNA

Tamil_Letter_TA

Tamil_Letter_NA

Tamil_Letter_NNNA

Tamil_Letter_PA

Tamil_Letter_MA

Tamil_Letter_YA

Tamil_Letter_RA

Tamil_Letter_RRA

Tamil_Letter_LA

Tamil_Letter_LLA

Tamil_Letter_LLLA

Tamil_Letter_VA

Tamil_Letter_SHA

Tamil_Letter_SSA

Tamil_Letter_SA

Tamil_Letter_HA

]

TamilVowelSign = [

Tamil_Vowel_Sign_AA

Tamil_Vowel_Sign_I

Tamil_Vowel_Sign_II

Tamil_Vowel_Sign_U

Tamil_Vowel_Sign_UU

Tamil_Vowel_Sign_E

Tamil_Vowel_Sign_EE

Tamil_Vowel_Sign_AI

Tamil_Vowel_Sign_O

Tamil_Vowel_Sign_OO

Tamil_Vowel_Sign_AU

]

TAMIL SIGN VISARGA - Aytham

ASCIIDigit = [0-9]

  • Rules

# Rules have the following format:

# <sequence> : <result>

# Key:

# <sequence> is the sequence of characters starting from the current position in the label

#    where each element is either a named character or a member of a character set defined above.

# <result> is either "fail" or "next"

# Logically, a label is processed by iterating through its character positions

# In each iteration, each rule is checked with the substring starting from the current character position.

# If the current substring matches then the result is applied as follows:

#   fail: stop, the label is invalid

#   next: move to the next character position after the end of the matched string

# If the processing reaches the end of the string, then the label is valid.

 

# Variants:

# A variant is defined by a rule of the form

# <sequence1> | <sequence2> : variant

# If the current substring matches either <sequence1> or <sequence2>, then note that

#    the label contains a variant, and then move to the next character position.

# we now define each of the special cases, and finally the general rules.

# allow ik + ssa as either a single glyph (க்ஷ) or separate glyphs (க்‌ஷ)

# these are variants of each other

# NOTE GD-20100416: we could also allow just one form, and make the other invalid

# This is the only place where ZWNJ is valid

Tamil_Letter_KA Tamil_Sign_Pulli Tamil_Letter_SSA | Tamil_Letter_KA Tamil_Sign_Pulli ZWNJ Tamil_Letter_SSA : variant

# the ZWNJ is not valid anywhere else except in the sequence2 above

ZWNJ : fail

# disallow old form of Shri (ஸ+்+ர+ீ)

Tamil_Letter_SA Tamil_Sign_Pulli Tamil_Letter_RA Tamil_Vowel_Sign_II

: fail

# Note: the valid representation of Shri is

# Tamil_Letter_SHA Tamil_Sign_Pulli Tamil_Letter_RA Tamil_Vowel_Sign_II (ஶ+்+ர+ீ)

# we don't need a special rule for this

# disallow a LLA after a consonant with a Kombu (e.g. கெ ள) unless it is modified by a vowel sign or Pulli

#    to avoid confusion with TamilConsonant+Vowel Sign AU

# It is presumed that this sequence will never occur in a valid word

# the kombu should be preceeded by a consonant

TamilConsonant Tamil_VowelSign_E Tamil_Letter_LLA TamilVowelSign : next

TamilConsonant Tamil_Vowel_Sign_E Tamil_Letter_LLA Tamil_Sign_Pulli : next

TamilConsonant Tamil_Vowel_Sign_E Tamil_Letter_LLA : fail

# disallow a LLA after Letter O (ஒ ள) unless it is modified by a vowel sign or Pulli

#   to avoid confusion with Letter AU (ஔ)

# again, we assume that this sequence will never occur in a valid word

Tamil_Letter_O Tamil_Letter_LLA TamilVowelSign : next

Tamil_Letter_O Tamil_Letter_LLA Tamil_Sign_Pulli : next

Tamil_Letter_O Tamil_Letter_LLA : fail

 

# General Rules

# a vowel sign or a pulli (virama) can only follow a consonant and is not valid elsewhere

TamilConsonant TamilVowelSign : next

TamilConsonant Tamil_Sign_Pulli  : next

TamilVowelSign : fail

Tamil_Sign_Pulli : fail

 

# allow consonants, vowels, Aytham, European numerals anywhere (unless disallowed by previous rules)

TamilConsonant : next

TamilVowel : next

Tamil_Sign_Aytham : next

ASCIIDigit : next

Hyphen-Minus : next

# IDN rules, which are not implemented in this table, restrict the placement of hyphen-minus

# anything else is invalid

: fail