CS 277: Control and Reinforcement Learning

Winter 2022

Course logistics

  • When: Tuesdays and Thursdays at 11am–12:20.
  • Where: Zoom.
  • Format:
    • Lectures: for most topics, lecture videos will be uploaded to this page ahead of the scheduled class for that topic. Access requires a uci.edu account.
    • Class discussions: most Tuesdays and Thursdays at 11am–12:20, we will have a virtual class discussion including a recap of the topic, any questions or advanced aspects participants bring up, and solution of any recently due assignment. Attendance in these discussions is optional and they will be recorded and uploaded to this page.
    • Quizzes: every week, there will be a quiz about that week’s topics, due by the end of the week. Week 1’s quiz is about background concepts in math, algorithms, and machine learning.
    • Assignments: there will be 5 assignments, due roughly every other week. Only the best 4 assignments will be averaged for the final grade, but a bonus will be given for scoring at least 50% on all assignments.
  • Announcements and discussion forum:
    • We will be using Ed Discussion for important course announcements and course-related discussions.
    • Please post on the forum, publicly or privately, all course-related questions.
    • Please note that the identity of anonymous posters is visible to the course staff.
    • To help us keep things in order, please do not email course staff, except for personal matters unrelated to the course. We often reply with “Please use the forum for course-related matters.”
  • Quizzes and assignments:
    • Quizzes and assignments will also be uploaded to this page and submitted on Gradescope.
  • We won’t be using the course’s Canvas page.
  • Instructor: Prof. Roy Fox
    • Office hours: Calendly
    • Enrolled students are welcome to:
      • Schedule 15-minute slots (more than once if needed);
      • Give at least 4-hour notice;
      • Attend individually or with classmates.
  • Teaching assistant: Tiancheng Xu

Grading policy

  • Assignments: 80% (+5% bonus)
    • Best 4 assignments: 20% each.
    • Score at least 50% on each of 5 assignments: 5% bonus.
    • Late submission policy: 5 grace days total for all assignments.
  • Quizzes: 16% (+2% bonus)
    • 6 quizzes: 3% each.
    • Deadline on Fridays (end of day).
    • Late submission policy: up to 2 submissions allowed by Monday (end of day).
  • Participation: 4%
    • Forum participation: 2%.
      • Post on the forum at least a few on-topic (excluding administrative) questions, answers, thoughts, or useful links.
    • Course evaluations: 2%.

Schedule

(Week) Dates Tuesday Thursday Friday
(1) Jan 4, 6, 7 Introduction
 Slides: 
 Live lecture: 
Imitation Learning
 Slides: 
 Lecture videos:
 Segment 1 
 Segment 2 
 Segment 3 
 Class discussion: 
Quiz 1 due: 
(2) Jan 11, 13, 14 Temporal-Difference Methods
 Slides: 
 Lecture videos:
 Segment 1 
 Segment 2 
 Segment 3 
 Class discussion: 
Deep Q-Learning
 Slides: 
 Lecture videos:
 Segment 1 
 Segment 2 
 Segment 3 
 Class discussion: 
Quiz 2 due 1/19:
 
(3) Jan 18, 20 Policy-Gradient Methods
 Slides: 
 Live lecture: 
 Assignment 1 due: 
Advanced Model-Free Methods
 Slides: 
 Live lecture: 
(4) Jan 25, 27, 28 Model-Free Review
 Live lecture: 
– canceled – Quiz 3 due: 
(5) Feb 1, 3 Exploration
 Slides: 
 Live lecture: 
Optimal Control
 Slides: 
 Lecture videos:
 Segment 1 
 Segment 2 
 Segment 3 
 Class discussion: 
(6) Feb 8, 10, 11 Stochastic Optimal Control
 Slides: 
 Live lecture: 
 Assignment 2 due: 
Planning
 Slides: 
 Live lecture: 
Quiz 4 due: 
(7) Feb 15, 17 Model-Based Methods
 Slides: 
 Live lecture: 
Partial Observability
 Slides: 
 Live lecture: 
(8) Feb 22, 24, 25 Inverse RL
 Slides: 
 Live lecture: 
 Assignment 3 due: 
Bounded RL
 Slides: 
 Live lecture: 
Quiz 5 due: 
(9) Mar 1, 3, 4 Bounded RL (cont.)
 Slides: 
 Live lecture: 
Structured Control
 Slides: 
 Live lecture: 
Assignment 4 due:
 
 Quiz 6 due 3/7:
 
(10) Mar 8, 10 Multi-Task Learning
 Slides: 
 Live lecture: 
Open Questions
 Slides: 
 Live lecture: 
(11) Mar 15 Assignment 5 due: 

Note: the planned schedule is subject to change.

Compute Resources

RL Resources

Courses
Books
RL libraries
More resources

Further reading

Imitation Learning
Temporal-difference methods
Policy-gradient methods
Exploration
Model-based methods
Inverse Reinforcement Learning
Bounded Reinforcement Learning
Structured Control
Multi-Task Learning

Academic honesty

Don’t cheat. Academic honesty is a requirement for passing this class. Compromising the academic integrity of this course is subject to a failing grade. The work you submit must be your own. Academic dishonesty includes, among other things, partially copying answers from other students or online resources, allowing other students to partially copy your answers, communicating information about exam answers to other students during an exam, or attempting to use disallowed notes or other aids during an exam. If you do so, you will be in violation of the UCI Policy on Academic Honesty and the ICS Policy on Academic Honesty. It is your responsibility to read and understand these policies, in light of UCI’s definitions and examples of academic misconduct. Note that any instance of academic dishonesty will be reported to the Academic Integrity Administrative Office for disciplinary action, and may fail the course.