Abstract
We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.
Original language | English |
---|---|
Pages | 111-116 |
Publication status | Published - Apr 2014 |
Event | 14th Conference of the European Chapter of the Association for Computational Linguistics - Gothenburg, Sweden Duration: 26 Apr 2014 → 30 Apr 2014 |
Conference
Conference | 14th Conference of the European Chapter of the Association for Computational Linguistics |
---|---|
Country/Territory | Sweden |
City | Gothenburg |
Period | 26/04/14 → 30/04/14 |