Cross-lingual transfer between typologically related languages has been proven successful for the task of morphological inflection. However, if the languages do not share the same script, current methods yield more modest improvements. We explore the …
This is a post regarding our paper that will be presented at ACL 2020. tl;dr: You can use previously published results to get an estimation of the performance on a new experiment, before running it!
This is a post regarding our paper that got accepted at ACL 2020.
Word embeddings are ubiquitous in modern NLP, from static ones (like word2vec or fasttext) to contextual representations obtained from ELMo, BERT, and other models.
A note on evaluating multilingual benchmarks Antonis Anastasopoulos, December 2019. tl;dr: Be careful when reporting averages for multilingual benchmarks, especially if making claims about multilinguality. In addition, averaging by language family can provide additional insights.
We introduce a new resource, AlloVera, which provides mappings from 218 allophones to phonemes for 14 languages. Phonemes are contrastive phonological units, and allophones are their various concrete realizations, which are predictable from …
Learning general representations of text is a fundamental problem for many natural language understanding (NLU) tasks. Previously, researchers have proposed to use language model pre-training and multi-task learning to learn robust representations. …
Recent years have seen exceptional strides in the task of automatic morphological inflection generation. However, for a long tail of languages the necessary resources are hard to come by, and state-of-the-art neural methods that work well under …
Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages. …
Low-resource language pairs with a paucity of parallel data pose challenges for machine translation in terms of both adequacy and fluency. Data augmentation utilizing a large amount of monolingual data is regarded as an effective way to alleviate the …