George Mason NLP

George Mason University

George Mason Natural Language Processing Group

Natural language processing (NLP) aims to enable computers to use human languages – so that people can, for example, interact with computers naturally; or communicate with people who don’t speak a common language; or manipulate speech or text data at scales not otherwise possible. The NLP group at George Mason Computer Science is interested in all aspects of NLP, with a focus on building tools for under-served languages.

We are currently working on multilingual models, on building Machine Translation robust to L2-language variations, and on NLP for documentation of endangered languages.

News

August 2021 - The GMNLP group is growing, as Ziyu Yao will be joining CS@GMU as an assistant professor!
May 2021 - 1 paper accepted at Interspeech 2021 and one at the NL4Prog Workshop. Stay tuned for preprints and code!
May 2021 - Antonis received a grant from Virginia Research Inverstment Fund (with Hemant Purohit (PI), Huzefa Rangwala, and Tonya Reaves) to design tools for proactive counter-disinformation communication!
April 2021 - 2 papers accepted at ACL 2021! One is about fairness and equity in Question Answering systems, and the second is on Machine Translation for dialectal language variants. Preprints and code are available here !
March 2021 - 1 paper accepted at NAACL! Preprint and details here .
January 2021 - Antonis received a grant from the National Endowment for the Humanities to build Optical Charachter Recognition tools for under-served languages (and especially Indigenous Latin American ones)!
January 2021 - Antonis spoke to the Global Podcast for the TICO-19 project.
November 2020 - Congratulations to Mahfuz for winning one of the two best paper awards at the W-NUT workshop!
September 2020 - 4 papers accepted at the main EMNLP conference and 1 paper accepted at the Findings of EMNLP! Preprints are available below!

Courses

CS 695, Fall 2021: Special Topics in Natural Language Processing (Yao)
CS 499, Spring 2021: Natural Language Processing (Anastasopoulos)
CS 695, Fall 2020: Special Topics in Natural Language Processing (Anastasopoulos)

Projects

Our research is/has been supported by the following organizations/companies:

Speech

Most languages of the world are “oral”: they are not traditionally written and even if an alphabet exists, the community doesn’t usually use it. Hence, building NLP systems that can directly operate on speech input is paramount.

Morphology

Human language is marked by considerable diversity around the world, and the surface form of languages varies substantially. Morphology describes the way through which different word forms arise from lexemes. Computational morphology attempts to reproduce this process across languages, or uses machine learning models to model/discover the morphophonological processes that exist in a language.

Robustness

NLP systems are typically trained and evaluated in “clean” settings, over data without significant noise. However, systems deployed in the real world need to deal with vast amounts of noise. At GMU NLP we work towards making NLP systems more robust to several types of noise (adversarial or naturally occuring).

Language Documentation

Language Documentation aims at producing a permanent record that describes a language as used by its language community by producing a formal grammatical description along with a lexicon. Our group works on integrating NLP systems into the documentation workflow, aiming to speed-up the process and help the work of field linguists and language communities.

Machine Translation

Machine Translation is the task of translating between human languages using computers. Starting from simple word-for-word rule-based system in 1950s, we now have large multilingual neural models that can learn translate between dozens of languages.

Multilingual NLP

An exciting research direction that we pursue at GMU NLP is building multi-lingual and polyglot systems. The languages of the world often share similar characteristics, and training systems cross-lingually allows us to leverage these similarities and overcome data scarcity issues.

Members

Antonios Anastasopoulos

Assistant Professor

Computational Linguistics, Machine Translation, Speech Recognition, NLP for Endangered Languages

Fahim Faisal

PhD Student

Computational linguistics, Natural language processing, Machine learning

Md Mahfuz Ibn Alam

PhD Student

Natural Language Processing, Machine Learning, Computer Vision, Common Sense Reasoning

Sharlina Keshava

CS Master’s Student

Natural Language Processing, Fairness in AI, Multilingual NLP, Machine Learning, Deep Learning

Huayu Zhou

PhD Student

Natural Language Processing, Machine Translation, Machine Learning, Data Mining

Ruoyu (Roy) Xie

Undergraduate Student

Natural Language Processing, Machine Learning, Computer Vision

Vishwajeet Vijay Paradkar

Masters Student

Natural Language Processing

Collaborators

Claytone Sikasote

MS@African Masters of Machine Intelligence and Lecturer@University of Zambia

Language Processing for Bemba

Recent Publications

Browse all publications.

Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov. Machine Translation into Low-resource Language Varieties. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021.

PDF Code

Arnab Debnath, Navid Rajabi, Fardina Fathmiul Alam, Antonios Anastasopoulos. Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021.

PDF Code

Fahim Faisal, Sharlina Keshava, Md Mahfuz Ibn Alam, Antonios Anastasopoulos. SD-QA: Spoken Dialectal Question Answering for the Real World. preprint, 2021.

PDF Code Dataset Project

Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David R. Mortensen, Michael R. Marlo, Graham Neubig. Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties. Proceedings of Interspeech 2021, 2021.

PDF Project

Adithya Pratapa, Antonios Anastasopoulos, Shruti Rijhwani, Aditi Chaudhary, David R. Mortensen, Graham Neubig, Yulia Tsvetkov. Evaluating the Morphosyntactic Well-formedness of Generated Texts. arXiv, 2021.

PDF Code

Claytone Sikasote, Antonios Anastasopoulos. BembaSpeech: A Speech Recognition Corpus for the Bemba Language. Proceedings of the Africa NLP Workshop, 2021.

PDF Code Dataset Project

Benjamin Muller, Antonios Anastasopoulos, Benoît Sagot, Djamé Seddah. When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual LMs. 2021.

PDF

See all publications

George Mason NLP

George Mason University

George Mason Natural Language Processing Group

News

Courses

Projects

Speech

Morphology

Robustness

Language Documentation

Machine Translation

Multilingual NLP

Recent Posts

Predicting Performance for Natural Language Processing Tasks

Should All Cross-Lingual Embeddings Speak English?

A note on evaluating multilingual benchmarks

Members

Members

Antonios Anastasopoulos

Assistant Professor

Fahim Faisal

PhD Student

Md Mahfuz Ibn Alam

PhD Student

Sharlina Keshava

CS Master’s Student

Huayu Zhou

PhD Student

Ruoyu (Roy) Xie

Undergraduate Student

Vishwajeet Vijay Paradkar

Masters Student

Collaborators

Claytone Sikasote

MS@African Masters of Machine Intelligence and Lecturer@University of Zambia

Recent Publications