Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

Georgios Kontonatsios, Yannis Korkontzelos, Jun'ichi Tsujii, Sophia Ananiadou

Research output: Contribution to conferencePaper

44 Downloads (Pure)

Abstract

We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.
Original languageEnglish
Pages111-116
Publication statusPublished - Apr 2014
Event14th Conference of the European Chapter of the Association for Computational Linguistics - Gothenburg, Sweden
Duration: 26 Apr 201430 Apr 2014

Conference

Conference14th Conference of the European Chapter of the Association for Computational Linguistics
CountrySweden
CityGothenburg
Period26/04/1430/04/14

    Fingerprint

Cite this

Kontonatsios, G., Korkontzelos, Y., Tsujii, J., & Ananiadou, S. (2014). Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. 111-116. Paper presented at 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden. http://www.aclweb.org/anthology/E14-4022