We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.
|Publication status||Published - Apr 2014|
|Event||14th Conference of the European Chapter of the Association for Computational Linguistics - Gothenburg, Sweden|
Duration: 26 Apr 2014 → 30 Apr 2014
|Conference||14th Conference of the European Chapter of the Association for Computational Linguistics|
|Period||26/04/14 → 30/04/14|
Kontonatsios, G., Korkontzelos, Y., Tsujii, J., & Ananiadou, S. (2014). Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. 111-116. Paper presented at 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden. http://www.aclweb.org/anthology/E14-4022