Very Low Resource Machine Translation...
MPhil ACS 2020/21
There are no machine translation models for most of the 7000 of the world’s languages. One of the major challenges in machine translation is finding parallel sentences. The bible has been translated to more than 1500 languages, including exotic ones. This makes it a valuable resource for machine translation of low resource languages.
The aim of this study is to build an MT system using the data set provided by the shared task EMNLP WMT 2020. The data set contains parallel sentences from German to Upper Sorbian and from Upper Sorbian to German. The task is to explore different methods supervised and unsupervised to train an MT system. In addition, as an extension, the study will explore the feasibility of using the bible as a pre-training resource.
Link to Shared Task: Shared Task: Unsupervised MT and Very Low Resource Supervised MT
Evaluation Criteria: BLEU score or automated evaluation as provided by the shared task.
References (*more to come):
Universal Language Model Fine-tuning for Text Classification - Howard and Ruder (2018)
Phrase-Based & Neural Unsupervised Machine Translation - Lample et al. (2018)
Meta-Learning for Low-Resource Neural Machine Translation - Gu et al (2018)