Persian/Farsi (lexicon size 450,000, selection October 2009) The Farsi or Persian language is written in the Arab script, but being an Indo-European language vowels are important.
Urdu (lexicon size 131,000 , selection October 2009) The Urdu language is closely related to Hindi, but written in the Arab script. Urdu and Hindi are Indo-European languages.
Breton (lexicon size 210,000, selection July 2007) The Breton language is spoken in French Bretagne. It is a Celtic language once related to extincted Cornish in the UK.
Hindi (lexicon size 156.000, selection November 2011) The Hindi language is spoken in northern and central India. Written Hindi is relatively standardized over the whole Hindi language area. It is an Indo-Aryan language. Althrough related to Urdu, Hindi does not favour the use of Persian and Arabic loanwords. Hindi is written in the Devanagari script, it includes a lot of complex char- acters, consisting of vowels, consonants, vowel-signs (matras), numerals, and diacritical marks.
Marathi (lexicon size 153.000, selection December 2009) The Marathi language is spoken in the Mahatashtra state of India. It is an Indo-Aryan language written in the Devanagari script.
Nepalese (lexicon size 125.000, selection December 2009) The Nepalese language (Nepali) is spoken in the Himalayan state of Nepal between India and China. Nepalese is written in the Devanagari script.
Kurdish (Northern) (lexicon size 90,000, selection July 2009) belongs to the Iranian group of languages. Kurdish is spoken in Turkey, Iraq, Iran, Armenia, Georgia and Azerbai- jan. The latin script is used for the Northern variety of Kurdish.
Malayalam (lexicon size 377,000, selection December 2009) The Malayalam language is spoken in Kerala, a state in the south of India. It is a Dravidian language written in the Malayalam script, a descendant of the Brahmi script.
Bengali (lexicon size 126,000, selection November 2009) The Bengali language is spoken in Bangladesh. It is a Indo-Aryan language written in the Bengali script, a de- scendant of the Brahmi script.
Gujarati (lexicon size 185,000, selection October 2009) The Gujarati language is spoken in the Indian state of Gujarat. It is a Indo-Aryan language written in the Gujarati script, a descendant of the Brahmi script.
Tamil (lexicon size 105,000, selection December 2009) The Tamil language is spoken in southern India (Tamil Nadu) and Sri Lanka. It is a Dravidian language written in the Tamil script, a descendant of the Brahmi script. Tamil has many Indo-Aryan loanwords. Tamil in Sri Lanka incorporates loadwords from the Dutch, Portuguese, and English language.
Sinhala (lexicon size 208,000, selection November 2009) The Sinhala language is spoken in Sri Lanka India. It is an Indo-Aryan branch of the Indo-European languages written in the Sinhala script, a descendant of the Indian Brahmi script. There is some affinity to neighbouring