George Mason NLP

George Mason NLP

George Mason University

George Mason Natural Language Processing Group

Natural language processing (NLP) aims to enable computers to use human languages – so that people can, for example, interact with computers naturally; or communicate with people who don’t speak a common language; or manipulate speech or text data at scales not otherwise possible. The NLP group at George Mason Computer Science is interested in all aspects of NLP, with a focus on building tools for under-served languages.

We are currently working on multilingual models, on building Machine Translation robust to L2-language variations, and on NLP for documentation of endangered languages.

I have open PhD positions and am looking for several students to start in the Fall 2021. Do reach out if you have a passion for language and NLP!
Application Link

News

  • November 2020 - Congratulations to Mahfuz for winning one of the two best papaer awards at the W-NUT workshops!
  • December 2020 - Antonis will present a tutorial on NLP for endangered languages at COLING 2020 with Hilaria Cruz, Chistopher Cox, and Graham Neubig.
  • October 2020 - 1 paper acceted at COLING, preprint coming soon!
  • September 2020 - Mahfuz will present his first NLP paper on MT Robustness at the W-NUT workshop! Preprint coming soon!
  • September 2020 - 4 papers accepted at the main EMNLP conference and 1 paper accepted at the Findings of EMNLP! Preprints are available here!
  • August 2020 - Antonis starting the NLP group at the George Mason Computer Science department!
  • April 2020 - 3 papers accepted at ACL 2020, and one at the SIGMORPHON Workshop!

Projects

*

Speech

Most languages of the world are “oral”: they are not traditionally written and even if an alphabet exists, the community doesn’t usually use it. Hence, building NLP systems that can directly operate on speech input is paramount.

Morphology

Human language is marked by considerable diversity around the world, and the surface form of languages varies substantially. Morphology describes the way through which different word forms arise from lexemes. Computational morphology attempts to reproduce this process across languages, or uses machine learning models to model/discover the morphophonological processes that exist in a language.

Robustness

NLP systems are typically trained and evaluated in “clean” settings, over data without significant noise. However, systems deployed in the real world need to deal with vast amounts of noise. At GMU NLP we work towards making NLP systems more robust to several types of noise (adversarial or naturally occuring).

Language Documentation

Language Documentation aims at producing a permanent record that describes a language as used by its language community by producing a formal grammatical description along with a lexicon. Our group works on integrating NLP systems into the documentation workflow, aiming to speed-up the process and help the work of field linguists and language communities.

Machine Translation

Machine Translation is the task of translating between human languages using computers. Starting from simple word-for-word rule-based system in 1950s, we now have large multilingual neural models that can learn translate between dozens of languages.

Multilingual NLP

An exciting research direction that we pursue at GMU NLP is building multi-lingual and polyglot systems. The languages of the world often share similar characteristics, and training systems cross-lingually allows us to leverage these similarities and overcome data scarcity issues.

Members

Members

Avatar

Antonios Anastasopoulos

Assistant Professor

Computational Linguistics, Machine Translation, Speech Recognition, NLP for Endangered Languages

Avatar

Fahim Faisal

PhD Student

Computational linguistics, Natural language processing, Machine learning

Avatar

Md Mahfuz Ibn Alam

PhD Student

Natural Language Processing, Machine Learning, Computer Vision, Common Sense Reasoning

Recent Publications

Browse all publications.

Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations. International Conference on Computational Linguistics (COLING), 2020.

Fine-Tuning MT systems for Robustness to Second-Language Speaker Variations. Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), 2020.

PDF Code Project

Automatic Extraction of Rules Governing Morphological Agreement. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.

PDF Code

Automatic Extraction of Rules Governing Morphological Agreement. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.

PDF

Dynamic Data Selection and Weighting for Iterative Back-Translation. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.

PDF Code

Dynamic Data Selection and Weighting for Iterative Back-Translation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.

PDF Code

It's not a Non-Issue: Negation as a Source of Error in Machine Translation. Findings of EMNLP, 2020.

PDF Code