TBRC Talks: A Rule Based Tagger for Classical Tibetan, Tuesday 11/4

October 29, 2014

TBRC Talks presents:

Tibetan in Digital Communication

A Rule Based Tagger for Classical Tibetan: Negation and Verb Stems Classification

With Dr. Nathan Hill, University of London, School of Oriental and African Studies (SOAS)

Demo and discussion with Dorji Wangchuk & Orna Almogi

Dr. Hill is engaged in building a 1,000,000 syllable part-of-speech tagged corpus of Tibetan texts spanning the language's entire history. In addition to the corpus, Dr. Hill is developing a number of digital tools that allow for the corpus to be employed in many areas of humanities research, and enable other researchers to more easily develop their own corpora of software tools. The corpus will itself be a powerful resource for scholars working with Tibetan language materials in a wide range of disciplines –including history, religion, literature and linguistics–since it offers ready access to, and comparison across, texts from different time periods, regions and genres. It will also provide an important foundation for subsequent work on a historically comprehensive, lexicographically rigorous dictionary of Tibetan, akin to the Oxford English Dictionary.

By building this corpus for Tibetan, the cost of developing language technologies, such as text messaging, spellcheckers and machine-aided translation will be reduced. These technologies would give Tibetans the choice to use their language as they see fit in a world that is increasingly shaped by digital communication.

Tuesday, November 4th,

12:00 – 2:00

Tibetan Buddhist Resource Center
1430 Massachusetts Avenue, 5th Floor
Cambridge, MA 02138