Syllabus
CS 499
Natural Language Processing
Instructor
Antonios Anastasopoulos (antonis [at] gmu [dot] edu)Office Hours: Online (check Blackboard for Zoom link), TBD. Email for additional appointments.
Teaching Assistant
Md Mahfuz Ibn Alam (malam21 [at] gmu [dot] edu)Office Hours: Online (check Blackboard for Zoom link), TBD.
Meets
Tuesdays and Thursdays, 12:00 to 1:15 PM, Online (Check Blackboard for Zoom link and password).Textbook
Speech and Language Processing (2nd Edition, 2007, Prentice-Hall), by Daniel Jurafsky and James MartinCourse Web Page
https://cs.gmu.edu/~antonis/course/cs499-spring21/ -- We will also communicate through Piazza (Piazza Link).Course Description
Computers process massive amounts of information every day in the form of human language. Although they do not understand it, they can learn how to do things like answer questions about it, or translate it into other languages. This course will be about the variety of ways to represent human languages (like Swahili, English, or Chinese) as computational systems, and how to exploit those representations to write programs that do neat stuff with text and speech data, like- translation,
- summarization,
- extract information,
- question answering, or
- conversational agents
This field is called Natural Language Processing (or Computational Linguistics), and it is extremely interdisciplinary. As a result the course will include materials that are central to Machine Learning and Linguistics.
We'll cover computational treatments of words, sounds, sentences, meanings, and conversations. We'll see how probabilities and real-world text data can help. We'll see how different levels interact in state-of-the-art approaches (including neural models) to applications like translation and information extraction.
From a software engineering perspective, there will be an emphasis on rapid prototyping, a useful skill in many other areas of Computer Science.
Class Format and Online Classroom Specifics
Readings are not required, but highly recommended to do before the class -- they will be very helpful in taking the most out of the recorded and synchronous lectures. In the synchronous sessions, I will go through the major components of the day's lecture, and we will go into more depth. I'll be using Zoom for video-conferencing. I won't require you to have your video on, but it will be greatly appreciated (noone likes to just talk to a screen).Prerequisites
CS Courses on data structures (CS310) and algorithms (CS330), and strong programming skills (we will mostly use Python). Please contact the instructor if you have questions about the necessary background.Project
A major component of the class will be the project. Throughout the course, you will develop an application of NLP techniques to a topic of interest to you. You may work alone or in pairs, as long as you clearly define which person did which parts of the work. Each person should contribute equally. It’s not acceptable to do the same work for this project and another class’s project, but it’s acceptable (and encouraged) for this project to relate to another project as long as the boundary is clearly defined -- if you're unsure, check with the instructor.The project will consist of 4 deliverables: Details on the project here.
Homeworks
There will be 5 homework assignments, scatterred throughout the semester. Details here.Grading
Students will be evaluated through homeworks (50%) and a project (50%).Letter Grade | Points (out of 100) |
---|---|
A | 97-100 |
A- | 90-96 |
B+ | 86-89 |
B | 83-85 |
B- | 80-82 |
C+ | 76-79 |
C | 73-75 |
C- | 70-72 |
D | 60-69 |
F | 0-59 |
Late Submissions: In the case of a serious illness or other excused absence, as defined by university policies, coursework submissions will be accepted late by the same number of days as the excused absence. In case there are unforeseen circumstances that don’t let you turn in your assignments on time, you may submit part of an assignment on time for full credit and part of the assignment late with a penalty of 30% per week (that is, your score for that part will be $\lfloor 0.7^t s\rfloor$, where $s$ is your raw score and $t$ is the possibly fractional number of weeks late). No part of the assigment may be submitted more than once. No work may be submitted after the final project due date.
Tentative Schedule
# | Date | Topic | Readings | Assignments and Project Milestones |
---|---|---|---|---|
1 | 1/26 | Course Overview | Chap 1 slides | |
2 | 1/28 | Working with Text, Edit Distance | SLP2 Chap 2-2.1, 3.11 SLP3 Chap 2 Extra text processing slides Extra edit distance slides | Assignment 1 out |
3 | 2/2 | Words, morphology, and lexicons | Chap 3.1-3.9 | |
4 | 2/4 | Statistical Language models and smoothing | Chap 4.3-8 | Assignment 1 due, Assignment 2 out |
5 | 2/9 | Neural language models (neural models, part 1) | TBD | |
6 | 2/11 | Word embeddings (vector semantics) | SPL3 Chap 6 | Project Initial Idea due 2/12 |
7 | 2/16 | BERT and Family (neural models, part 2) | ||
8 | 2/18 | Part of speech tags | Chap 5.0-3 | Assignment 2 due 2/19, Assignment 3 out |
9 | 2/23 | Classification 1 (statistical models) | ||
10 | 2/25 | Classification 2 (neural models, part 3) | ||
11 | 3/2 | Conditional Random Fields | ||
12 | 3/4 | Syntactic representations of natural language | Chap 12.0-3 | |
13 | 3/9 | Parsing | Chap 12.7, Chap 13, Chap 14-14.2 | |
14 | 3/11 | Treebanks and PCFGs | Chap 12.4, 14.7 | Assignment 3 due, Assignment 4 out |
15 | 3/16 | Neural models for parsing (neural models part 4) | TBD | |
16 | 3/18 | Lexical semantics | Chap 17.0-2, 19.0-3 | Project Baseline due 3/19 |
17 | 3/23 | Verb/sentence semantics | Chap 17.2-4, Chap 19.4-6 | |
18 | 3/25 | Project Presentations | ||
19 | 3/30 | Alignment | Chap 25.5-7 | |
20 | 4/1 | Word-Based Machine Translation | Chap 25.0-2 | Assignment 4 due, Assignment 5 out |
21 | 4/6 | Phrase-Based Machine Translation | Chap 25.3-4, 25.8-9 | |
22 | 4/8 | Project Presentations | Project Presentations due 4/9 | |
23 | 4/13 | Neural Machine Translation models (neural models, part 5) | The Annotated Transformer, Sasha Rush Joey-NMT documentation | |
24 | 4/15 | NLP Beyond English | ||
25 | 4/20 | Discourse, entity linking, pragmatics | Chap 20.0-6, 20.8-11 | |
26 | 4/22 | Multimodality | Assignment 5 due 4/23 | |
27 | 4/27 | Reinforcement Learning | [TBD] | |
28 | 4/29 | Conclusion | Final project report due, 11:59pm |