CS 6804 Spring 2013

SCHEDULE

Date	Topics	Notes and resources
Jan 23	Introduction. Course overview.	Lecture notes
Jan 28	Markov Decision Processes.	Lecture notes Chapters 3 and 4 of Sutton and Barto
Jan 30	Value iteration. Bandit problems.	Lecture notes A useful survey by Mahajan and Teneketzis (2007)
Feb 4	Bandit problems continued. Reinforcement learning.	Lecture notes on bandits Lecture notes on RL Chapter 5 of Sutton and Barto
Feb 6	Reinforcement learning, continued.	Lecture notes
Feb 11	Optimal stopping.	Lecture notes Chapter 2 of Tom Ferguson's textbook
Feb 13	POMDPs.	Lecture notes A great (and technical) survey paper (Kaelbling, Littman, and Cassandra, 1998) Another great (and slightly less technical) survey paper (Littman, 2009)
Feb 18	Finishing up POMDPs. Intro to game theory.	Lecture notes (POMDPs) Lecture notes (game theory)
Feb 20	Game theory continued.	Lecture notes
Feb 25	Game theory continued.	Lecture notes
Feb 27	Game theory continued. Auctions.	Lecture notes
Mar 4	Auctions and mechanism design.	Lecture notes
Mar 6	Mechanism design, continued. Sanmay: The Challenge of Poker
Mar 18	Student paper presentations begin. Liangzhe Chen: The Knowledge Gradient Algorithm for a General Class of Online Learning Problems Mithun Chakraborty: Censored Exploration and the Dark Pool Problem
Mar 20	Sally Hamouda: A Dynamic Mixture Model to Detect Student Motivation and Proﬁciency Austin Bart: Comparing the Utility of State Features in Spoken Dialogue Using Reinforcement Learning
Mar 25	Vishwas Hebbur Venkata Subba Rao: A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Meenal Chhabra: Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand
Mar 27	Allen Lavoie: Apprenticeship learning via inverse reinforcement learning Marjan Momtazpour: To Teach or not to Teach? Decision Making Under Uncertainty in Ad Hoc Teams
Apr 1	Boris Huacan: Least Squares Policy Iteration Mohamed Magdy: Online Exploration in Least-Squares Policy Iteration
Apr 3	Prithwish Chakraborty: Partially observable Markov decision processes for spoken dialog systems Chun-Yi Su: Bayes' Bluff: Opponent Modelling in Poker
Apr 8	Fang Liu: Modeling Billiards Games
Apr 10	Farzaneh Tabataba: Learning the Demand Curve in Posted-Price Digital Goods Auctions K Alnajar: Using Iterated Reasoning to Predict Opponent Strategies
Apr 15	Behrooz Kamali: Active Learning for Matching Problems Fei Li: Multiagent learning using a variable learning rate
Apr 17	Huijuan Shao: Memory-bounded dynamic programming for DEC-POMDPs Andrew Burkard: Allocative and Dynamic Efficiency in NBA Decision Making
Apr 22	Qianzhou Du: An Analytic Solution to Discrete Bayesian Reinforcement Learning Jose Cadena: Hustling in Repeated Zero-Sum Games with Imperfect Execution
Apr 24	Scheduling/general discussion/agenda setting	Paper for the experiment
Apr 29
May 1	Project Presentations 1: Sally; Mithun and Meenal; Mohamed; Chun-Yi, Fang, and Qianzhou
May 6	Project Presentations 2: Liangzhe; Jose and Allen; K and Austin; Fei; Huijuan, Prithwish, and Vishwas
May 8	Project Presentations 3: Marjan and Farzaneh; Behrooz; Andrew; Boris

READING LIST

This list is still evolving and may change through the first few weeks of class.

Game Theory and Applications in AI
Billings et al, AIJ 2002	The Challenge of Poker	Sanmay
Sandholm, AI Magazine 2010	The State of Solving Large Incomplete-Information Games, and Application to Poker	Available
Southey et al, UAI 2005	Bayes' Bluff: Opponent Modelling in Poker	Chun-Yi
Littman and Stone, Decision Support Systems 2005	A Polynomial-time Nash Equilibrium Algorithm for Repeated Games	Vishwas
Greenwald et al, NIPS 2012	Approximating Equilibria in Sequential Auctions with Incomplete Information and Multi-Unit Demand	Meenal
Wunder et al, AAMAS 2011	Using Iterated Reasoning to Predict Opponent Strategies	K. Alnajar
Jordan et al, AAMAS 2007	Empirical Game-Theoretic Analysis of the TAC Supply Chain Game	Available
Archibald and Shoham, AAMAS 2009	Modeling Billiards Games	Fang Liu
Archibald and Shoham, IJCAI 2011	Hustling in Repeated Zero-Sum Games with Imperfect Execution	Jose Cadena
Hu and Wellman, JMLR 2003	Nash Q-Learning for General-Sum Stochastic Games	Available
Bowling and Veloso, AIJ 2002	Multiagent learning using a variable learning rate	Fei Li
Conitzer and Sandholm, ICML 2003	AWESOME: A General Multiagent Learning Algorithm that Converges in Self-Play and Learns a Best Response Against Stationary Opponents	Available
Reinforcement / Online / Optimal Learning and Sequential Decision-Making
Lagoudakis and Parr, JMLR 2003	Least Squares Policy Iteration	Boris
Poupart et al, ICML 2006	An Analytic Solution to Discrete Bayesian Reinforcement Learning	Qianzhou
Ryzhov et al, Operations Research 2012	The Knowledge Gradient Algorithm for a General Class of Online Learning Problems	Liangzhe
Charlin et al, ICML 2012	Active Learning for Matching Problems	Behrooz
Chhabra and Das, AAMAS 2011	Learning the Demand Curve in Posted-Price Digital Goods Auctions	Farzaneh
Stone and Kraus, AAMAS 2010	To Teach or not to Teach? Decision Making Under Uncertainty in Ad Hoc Teams	Marjan
Vermorel and Mohri, ECML 2005	Multi-Armed Bandit Algorithms and Empirical Evaluation	Available
Das and Tsitsiklis, JEBO 2010	When is it Important to Know You've Been Rejected? A Search Problem with Probabilistic Appearance of Offers	Available
Seuken and Zilberstein, IJCAI 2007	Memory-bounded dynamic programming for DEC-POMDPs	Huijuan
Williams and Young, Comp. Speech & Lang. 2007	Partially observable Markov decision processes for spoken dialog systems	Prithwish
Li et al, AAMAS 2009	Online Exploration in Least-Squares Policy Iteration	Mohamed
Abbeel and Ng, ICML 2004	Apprenticeship learning via inverse reinforcement learning	Allen
Johns and Woolf, AAAI 2006	A Dynamic Mixture Model to Detect Student Motivation and Proﬁciency	Sally
Ganchev et al, UAI 2009	Censored Exploration and the Dark Pool Problem	Mithun
Tetreault and Litman, NAACL 2006	Comparing the Utility of State Features in Spoken Dialogue Using Reinforcement Learning	Austin
Goldman and Rao, MIT Sloan Sports Analytics Conf 2011	Allocative and Dynamic Efficiency in NBA Decision Making	Andrew

CS 6804 Spring 2013

SCHEDULE

READING LIST

Game Theory and Applications in AI

Reinforcement / Online / Optimal Learning and Sequential Decision-Making