multilingual NLP | George Mason NLP

Transliteration for Cross-Lingual Morphological Inflection

Cross-lingual transfer between typologically related languages has been proven successful for the task of morphological inflection. However, if the languages do not share the same script, current methods yield more modest improvements. We explore the …

Predicting Performance for Natural Language Processing Tasks

This is a post regarding our paper that will be presented at ACL 2020. tl;dr: You can use previously published results to get an estimation of the performance on a new experiment, before running it!

Should All Cross-Lingual Embeddings Speak English?

This is a post regarding our paper that got accepted at ACL 2020. Word embeddings are ubiquitous in modern NLP, from static ones (like word2vec or fasttext) to contextual representations obtained from ELMo, BERT, and other models.

A note on evaluating multilingual benchmarks

A note on evaluating multilingual benchmarks Antonis Anastasopoulos, December 2019. tl;dr: Be careful when reporting averages for multilingual benchmarks, especially if making claims about multilinguality. In addition, averaging by language family can provide additional insights.

AlloVera: A Multilingual Allophone Database

We introduce a new resource, AlloVera, which provides mappings from 218 allophones to phonemes for 14 languages. Phonemes are contrastive phonological units, and allophones are their various concrete realizations, which are predictable from …

Universal Phone Recognition with a Multilingual Allophone System

Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks

Learning general representations of text is a fundamental problem for many natural language understanding (NLU) tasks. Previously, researchers have proposed to use language model pre-training and multi-task learning to learn robust representations. …

Pushing the Limits of Low-Resource Morphological Inflection

Recent years have seen exceptional strides in the task of automatic morphological inflection generation. However, for a long tail of languages the necessary resources are hard to come by, and state-of-the-art neural methods that work well under …

Choosing Transfer Languages for Cross-Lingual Learning

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages. …

Generalized Data Augmentation for Low-Resource Translation

Low-resource language pairs with a paucity of parallel data pose challenges for machine translation in terms of both adequacy and fluency. Data augmentation utilizing a large amount of monolingual data is regarded as an effective way to alleviate the …