Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

Georgios Kontonatsios, Yannis Korkontzelos, Jun'ichi Tsujii, Sophia Ananiadou

Research output: Contribution to conferencePaperpeer-review

15 Citations (Scopus)
98 Downloads (Pure)

Abstract

We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.
Original languageEnglish
Pages111-116
Publication statusPublished - Apr 2014
Event14th Conference of the European Chapter of the Association for Computational Linguistics - Gothenburg, Sweden
Duration: 26 Apr 201430 Apr 2014

Conference

Conference14th Conference of the European Chapter of the Association for Computational Linguistics
Country/TerritorySweden
CityGothenburg
Period26/04/1430/04/14

Fingerprint

Dive into the research topics of 'Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora'. Together they form a unique fingerprint.

Cite this