CS 6804 Spring 2013

SCHEDULE

 
Date Topics Notes and resources
Jan 23 Introduction. Course overview. Lecture notes
Jan 28 Markov Decision Processes. Lecture notes
Chapters 3 and 4 of Sutton and Barto
Jan 30 Value iteration. Bandit problems. Lecture notes
A useful survey by Mahajan and Teneketzis (2007)
Feb 4 Bandit problems continued. Reinforcement learning. Lecture notes on bandits
Lecture notes on RL
Chapter 5 of Sutton and Barto
Feb 6 Reinforcement learning, continued. Lecture notes
Feb 11 Optimal stopping. Lecture notes
Chapter 2 of Tom Ferguson's textbook
Feb 13 POMDPs. Lecture notes
A great (and technical) survey paper (Kaelbling, Littman, and Cassandra, 1998)
Another great (and slightly less technical) survey paper (Littman, 2009)
Feb 18 Finishing up POMDPs. Intro to game theory. Lecture notes (POMDPs)
Lecture notes (game theory)
Feb 20 Game theory continued. Lecture notes
Feb 25 Game theory continued. Lecture notes
Feb 27 Game theory continued. Auctions. Lecture notes
Mar 4 Auctions and mechanism design. Lecture notes
Mar 6 Mechanism design, continued.
Sanmay: The Challenge of Poker
Mar 18 Student paper presentations begin.
Liangzhe Chen: The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
Mithun Chakraborty: Censored Exploration and the Dark Pool Problem
Mar 20 Sally Hamouda: A Dynamic Mixture Model to Detect Student Motivation and Proficiency
Austin Bart: Comparing the Utility of State Features in Spoken Dialogue Using Reinforcement Learning
Mar 25 Vishwas Hebbur Venkata Subba Rao: A Polynomial-time Nash Equilibrium Algorithm for Repeated Games
Meenal Chhabra: Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand
Mar 27 Allen Lavoie: Apprenticeship learning via inverse reinforcement learning
Marjan Momtazpour: To Teach or not to Teach? Decision Making Under Uncertainty in Ad Hoc Teams
Apr 1 Boris Huacan: Least Squares Policy Iteration
Mohamed Magdy: Online Exploration in Least-Squares Policy Iteration
Apr 3 Prithwish Chakraborty: Partially observable Markov decision processes for spoken dialog systems
Chun-Yi Su: Bayes' Bluff: Opponent Modelling in Poker
Apr 8 Fang Liu: Modeling Billiards Games
Apr 10 Farzaneh Tabataba: Learning the Demand Curve in Posted-Price Digital Goods Auctions
K Alnajar: Using Iterated Reasoning to Predict Opponent Strategies
Apr 15 Behrooz Kamali: Active Learning for Matching Problems
Fei Li: Multiagent learning using a variable learning rate
Apr 17 Huijuan Shao: Memory-bounded dynamic programming for DEC-POMDPs
Andrew Burkard: Allocative and Dynamic Efficiency in NBA Decision Making
Apr 22 Qianzhou Du: An Analytic Solution to Discrete Bayesian Reinforcement Learning
Jose Cadena: Hustling in Repeated Zero-Sum Games with Imperfect Execution
Apr 24 Scheduling/general discussion/agenda setting Paper for the experiment
Apr 29
May 1 Project Presentations 1: Sally; Mithun and Meenal; Mohamed; Chun-Yi, Fang, and Qianzhou
May 6 Project Presentations 2: Liangzhe; Jose and Allen; K and Austin; Fei; Huijuan, Prithwish, and Vishwas
May 8 Project Presentations 3: Marjan and Farzaneh; Behrooz; Andrew; Boris
 


READING LIST

This list is still evolving and may change through the first few weeks of class.

Game Theory and Applications in AI

Billings et al, AIJ 2002 The Challenge of Poker Sanmay
Sandholm, AI Magazine 2010 The State of Solving Large Incomplete-Information Games, and Application to Poker Available
Southey et al, UAI 2005 Bayes' Bluff: Opponent Modelling in Poker Chun-Yi
Littman and Stone, Decision Support Systems 2005 A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Vishwas
Greenwald et al, NIPS 2012 Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand Meenal
Wunder et al, AAMAS 2011 Using Iterated Reasoning to Predict Opponent Strategies K. Alnajar
Jordan et al, AAMAS 2007 Empirical Game-Theoretic Analysis of the TAC Supply Chain Game Available
Archibald and Shoham, AAMAS 2009 Modeling Billiards Games Fang Liu
Archibald and Shoham, IJCAI 2011 Hustling in Repeated Zero-Sum Games with Imperfect Execution Jose Cadena
Hu and Wellman, JMLR 2003 Nash Q-Learning for General-Sum Stochastic Games Available
Bowling and Veloso, AIJ 2002 Multiagent learning using a variable learning rate Fei Li
Conitzer and Sandholm, ICML 2003 AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response Against Stationary Opponents Available

Reinforcement / Online / Optimal Learning and Sequential Decision-Making

Lagoudakis and Parr, JMLR 2003 Least Squares Policy Iteration Boris
Poupart et al, ICML 2006 An Analytic Solution to Discrete Bayesian Reinforcement Learning Qianzhou
Ryzhov et al, Operations Research 2012 The Knowledge Gradient Algorithm for a General Class of Online Learning Problems Liangzhe
Charlin et al, ICML 2012 Active Learning for Matching Problems Behrooz
Chhabra and Das, AAMAS 2011 Learning the Demand Curve in Posted-Price Digital Goods Auctions Farzaneh
Stone and Kraus, AAMAS 2010 To Teach or not to Teach? Decision Making Under Uncertainty in Ad Hoc Teams Marjan
Vermorel and Mohri, ECML 2005 Multi-Armed Bandit Algorithms and Empirical Evaluation Available
Das and Tsitsiklis, JEBO 2010 When is it Important to Know You've Been Rejected? A Search Problem with Probabilistic Appearance of Offers Available
Seuken and Zilberstein, IJCAI 2007 Memory-bounded dynamic programming for DEC-POMDPs Huijuan
Williams and Young, Comp. Speech & Lang. 2007 Partially observable Markov decision processes for spoken dialog systems Prithwish
Li et al, AAMAS 2009 Online Exploration in Least-Squares Policy Iteration Mohamed
Abbeel and Ng, ICML 2004 Apprenticeship learning via inverse reinforcement learning Allen
Johns and Woolf, AAAI 2006 A Dynamic Mixture Model to Detect Student Motivation and Proficiency Sally
Ganchev et al, UAI 2009 Censored Exploration and the Dark Pool Problem Mithun
Tetreault and Litman, NAACL 2006 Comparing the Utility of State Features in Spoken Dialogue Using Reinforcement Learning Austin
Goldman and Rao, MIT Sloan Sports Analytics Conf 2011 Allocative and Dynamic Efficiency in NBA Decision Making Andrew