Developing Asian language corpora: standards and practice

R. Xiao, T. McEnery, P. Baker, A. Hardie

    Research output: Contribution to conferencePaper

    Abstract

    This paper first discusses standards for developing Asian language corpora so as to facilitate international data exchange. Following this, we present two corpora of Asian languages developed at Lancaster University - the EMILLE Corpus, which contains 14 South Asian languages, and the Lancaster Corpus of Mandarin Chinese. Finally, we will demonstrate how to explore these corpora using Xara and other corpus tools.
    Original languageEnglish
    Publication statusPublished - 2004
    Event4th Workshop on Asian Language Resources - Sanya, China
    Duration: 25 Mar 2004 → …

    Conference

    Conference4th Workshop on Asian Language Resources
    Country/TerritoryChina
    CitySanya
    Period25/03/04 → …

    Fingerprint

    Dive into the research topics of 'Developing Asian language corpora: standards and practice'. Together they form a unique fingerprint.

    Cite this