Abstract
We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.
| Original language | English |
|---|---|
| Pages | 111-116 |
| Publication status | Published - Apr 2014 |
| Event | 14th Conference of the European Chapter of the Association for Computational Linguistics - Gothenburg, Sweden Duration: 26 Apr 2014 → 30 Apr 2014 |
Conference
| Conference | 14th Conference of the European Chapter of the Association for Computational Linguistics |
|---|---|
| Country/Territory | Sweden |
| City | Gothenburg |
| Period | 26/04/14 → 30/04/14 |
Fingerprint
Dive into the research topics of 'Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver