Developing Asian language corpora: standards and practice

R. Xiao, T. McEnery, P. Baker, A. Hardie

Research output: Contribution to conferencePaper

Abstract

This paper first discusses standards for developing Asian language corpora so as to facilitate international data exchange. Following this, we present two corpora of Asian languages developed at Lancaster University - the EMILLE Corpus, which contains 14 South Asian languages, and the Lancaster Corpus of Mandarin Chinese. Finally, we will demonstrate how to explore these corpora using Xara and other corpus tools.
Original languageEnglish
Publication statusPublished - 2004
Event4th Workshop on Asian Language Resources - Sanya, China
Duration: 25 Mar 2004 → …

Conference

Conference4th Workshop on Asian Language Resources
CountryChina
CitySanya
Period25/03/04 → …

Fingerprint Dive into the research topics of 'Developing Asian language corpora: standards and practice'. Together they form a unique fingerprint.

  • Cite this

    Xiao, R., McEnery, T., Baker, P., & Hardie, A. (2004). Developing Asian language corpora: standards and practice. Paper presented at 4th Workshop on Asian Language Resources, Sanya, China.