Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

Georgios Kontonatsios, Yannis Korkontzelos, Jun'ichi Tsujii, Sophia Ananiadou

Research output: Contribution to conferencePaper

5 Downloads (Pure)

Abstract

We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.
Original languageEnglish
Pages111-116
Publication statusPublished - Apr 2014
Event14th Conference of the European Chapter of the Association for Computational Linguistics - Gothenburg, Sweden
Duration: 26 Apr 201430 Apr 2014

Conference

Conference14th Conference of the European Chapter of the Association for Computational Linguistics
CountrySweden
CityGothenburg
Period26/04/1430/04/14

Fingerprint

Glossaries
Classifiers
Learning systems

Cite this

Kontonatsios, G., Korkontzelos, Y., Tsujii, J., & Ananiadou, S. (2014). Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. 111-116. Paper presented at 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden.
Kontonatsios, Georgios ; Korkontzelos, Yannis ; Tsujii, Jun'ichi ; Ananiadou, Sophia. / Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. Paper presented at 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden.
@conference{b65defa5105248f3a8529ebc93b7d075,
title = "Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora",
abstract = "We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.",
author = "Georgios Kontonatsios and Yannis Korkontzelos and Jun'ichi Tsujii and Sophia Ananiadou",
year = "2014",
month = "4",
language = "English",
pages = "111--116",
note = "14th Conference of the European Chapter of the Association for Computational Linguistics ; Conference date: 26-04-2014 Through 30-04-2014",

}

Kontonatsios, G, Korkontzelos, Y, Tsujii, J & Ananiadou, S 2014, 'Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora' Paper presented at 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 26/04/14 - 30/04/14, pp. 111-116.

Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. / Kontonatsios, Georgios; Korkontzelos, Yannis; Tsujii, Jun'ichi; Ananiadou, Sophia.

2014. 111-116 Paper presented at 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

AU - Kontonatsios, Georgios

AU - Korkontzelos, Yannis

AU - Tsujii, Jun'ichi

AU - Ananiadou, Sophia

PY - 2014/4

Y1 - 2014/4

N2 - We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.

AB - We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.

M3 - Paper

SP - 111

EP - 116

ER -

Kontonatsios G, Korkontzelos Y, Tsujii J, Ananiadou S. Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. 2014. Paper presented at 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden.