Syllabus

CS 499

Natural Language Processing

Instructor

Antonios Anastasopoulos (antonis [at] gmu [dot] edu)
Office Hours: Online (check Blackboard for Zoom link), TBD. Email for additional appointments.

Teaching Assistant

Md Mahfuz Ibn Alam (malam21 [at] gmu [dot] edu)
Office Hours: Online (check Blackboard for Zoom link), TBD.

Meets

Tuesdays and Thursdays, 12:00 to 1:15 PM, Online (Check Blackboard for Zoom link and password).

Textbook

Speech and Language Processing (2nd Edition, 2007, Prentice-Hall), by Daniel Jurafsky and James Martin

Course Web Page

https://cs.gmu.edu/~antonis/course/cs499-spring21/ -- We will also communicate through Piazza (Piazza Link).

Course Description

Computers process massive amounts of information every day in the form of human language. Although they do not understand it, they can learn how to do things like answer questions about it, or translate it into other languages. This course will be about the variety of ways to represent human languages (like Swahili, English, or Chinese) as computational systems, and how to exploit those representations to write programs that do neat stuff with text and speech data, like
  • translation,
  • summarization,
  • extract information,
  • question answering, or
  • conversational agents

This field is called Natural Language Processing (or Computational Linguistics), and it is extremely interdisciplinary. As a result the course will include materials that are central to Machine Learning and Linguistics.

We'll cover computational treatments of words, sounds, sentences, meanings, and conversations. We'll see how probabilities and real-world text data can help. We'll see how different levels interact in state-of-the-art approaches (including neural models) to applications like translation and information extraction.

From a software engineering perspective, there will be an emphasis on rapid prototyping, a useful skill in many other areas of Computer Science.

Class Format and Online Classroom Specifics

Readings are not required, but highly recommended to do before the class -- they will be very helpful in taking the most out of the recorded and synchronous lectures. In the synchronous sessions, I will go through the major components of the day's lecture, and we will go into more depth. I'll be using Zoom for video-conferencing. I won't require you to have your video on, but it will be greatly appreciated (noone likes to just talk to a screen).

Prerequisites

CS Courses on data structures (CS310) and algorithms (CS330), and strong programming skills (we will mostly use Python). Please contact the instructor if you have questions about the necessary background.

Project

A major component of the class will be the project. Throughout the course, you will develop an application of NLP techniques to a topic of interest to you. You may work alone or in pairs, as long as you clearly define which person did which parts of the work. Each person should contribute equally. It’s not acceptable to do the same work for this project and another class’s project, but it’s acceptable (and encouraged) for this project to relate to another project as long as the boundary is clearly defined -- if you're unsure, check with the instructor.
The project will consist of 4 deliverables: Details on the project here.

Homeworks

There will be 5 homework assignments, scatterred throughout the semester. Details here.

Grading

Students will be evaluated through homeworks (50%) and a project (50%).
Letter Grade Points (out of 100)
A 97-100
A- 90-96
B+ 86-89
B 83-85
B- 80-82
C+ 76-79
C 73-75
C- 70-72
D 60-69
F 0-59

Late Submissions: In the case of a serious illness or other excused absence, as defined by university policies, coursework submissions will be accepted late by the same number of days as the excused absence. In case there are unforeseen circumstances that don’t let you turn in your assignments on time, you may submit part of an assignment on time for full credit and part of the assignment late with a penalty of 30% per week (that is, your score for that part will be $\lfloor 0.7^t s\rfloor$, where $s$ is your raw score and $t$ is the possibly fractional number of weeks late). No part of the assigment may be submitted more than once. No work may be submitted after the final project due date.

Tentative Schedule

# Date Topic Readings Assignments and Project Milestones
1 1/26 Course Overview Chap 1
slides
2 1/28 Working with Text, Edit Distance SLP2 Chap 2-2.1, 3.11
SLP3 Chap 2
slides
Extra text processing slides
Extra edit distance slides
Assignment 1 out
3 2/2 Words, morphology, and lexicons Chap 3.1-3.9
slides
4 2/4 Word embeddings (vector semantics) SPL3 Chap 6
slides
Assignment 1 due (2/5), Assignment 2 out
5 2/9 Neural Embeddings, Chap 4.3-8
slides
6 2/11 n-gram language models and smoothing slides Project Initial Idea due 2/12
7 2/16 Neural language models (neural models, part 1) slides
8 2/18 BERT and Family (neural models, part 2) slides Assignment 2 due 2/23, Assignment 3 out
9 2/23 Part of speech tags Chap 5.0-3
slides
10 2/25 Classification 1 (statistical models) and Classification 2 (neural models, part 3) slides
11 3/2 Conditional Random Fields slides
12 3/4 Syntactic representations of natural language Chap 12.0-3
slides
13 3/9 Parsing Chap 12.7, Chap 13, Chap 14-14.2
slides
14 3/11 Treebanks and PCFGs Chap 12.4, 14.7
slides
Assignment 3 due 3/12, Assignment 4 out
15 3/16 Dependency parsing slides
16 3/18 Neural models for parsing (neural models part 4) SLP 3 Chapter 14 Project Baseline due 3/19
17 3/23 Alignment Chap 25.5-7
slides
18 3/25 EM, Statistical MT Chap 25.0-2, 25.3-4, 25.8-9
19 3/30 Neural Machine Translation models I (neural models, part 5) The Annotated Transformer, Sasha Rush
Joey-NMT documentation
20 4/1 Neural Machine Translation models II (neural models, part 5) The Annotated Transformer, Sasha Rush
Joey-NMT documentation
Assignment 4 due 4/2, Assignment 5 out
21 4/6 Transformers, continued
22 4/8 NLP Beyond English
23 4/13 Semi-supervised learning
24 4/15 Project Presentations Project Presentations due 4/16
25 4/20 Machine Reading and Question Answering
26 4/22 Summarization Assignment 5 due 4/23
27 4/27 Biases, Fairness, and Interpretability
28 4/29 Conclusion Final project report due 4/30

Honor Code

The class enforces the GMU Honor Code, and the more specific honor code policy special to the Department of Computer Science. You will be expected to adhere to this code and policy.

Note to Students

Take care of yourself! As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, global pandemics, feeling down, difficulty concentrating and/or lack of motivation. All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of having a healthy life is learning how to ask for help. Asking for support sooner rather than later is almost always helpful. GMU services are available, and treatment does work. You can learn more about confidential mental health services available on campus at: https://caps.gmu.edu/. Support is always available (24/7) from Counseling and Psychological Services: 703-527-4077.

Disabilities

If you have a documented learning disability or other condition which may affect academic performance, make sure this documentation is on file with the Office of Disability Services and come talk to me about accommodations. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Services, I encourage you to contact them at ods@gmu.edu.
Next