DM887: Reinforcement learning
Entry requirements
Academic preconditions
A Bachelor‘s degree in computer science, applied mathematics, or pure mathematics including programming skills.
Course introduction
Expected learning outcome
The learning objective of the course is that the student demonstrates the ability to:
- Following the recent methodological, algorithmic, and theoretical advances of the reinforcement learning field from their direct resources, such as academic publications, textbooks, and technical presentations of domain experts,
- Designing and implementing original reinforcement learning algorithms tailored to new problems,
- Conducting formal analysis to characterize the computational footprint and predictive performance of a given reinforcement learning algorithm.
The course builds on the knowledge acquired in basic machine learning or data modeling courses such as DM581 Introduction to Machine Learning, DM566/DS804 Data Mining and Machine Learning, DM568/DM873/DS809 Deep Learning, DS807 Applied Machine Learning, ST811 Multivariate Statistical Analysis, or ST819 Multivariate Regression Analysis. The course gives an academic basis for doing bachelor’s and master’s projects, as well as for other master’s level courses on advanced artificial intelligence subjects.
Content
The following main topics are contained in the course:
1. Markov decision processes: Basic concepts and properties
2. Dynamic programming and its approximations: Temporal difference methods: and Q-learning
3. Policy-based and actor-critic methods: REINFORCE, MaxEnt, and deterministic policy gradient methods
4. Bandit algorithms: Multi-armed bandits, stochastic bandits, contextual bandits
5. Theoretical analysis of reinforcement learning algorithms
6. Bayesian methods for deep learning: Approximate Bayesian inference of neural networks
7. Modeling environment dynamics: Probabilistic deep state-space models
8. Model-based reinforcement learning: Model predictive control and Dyna algorithms
9. Offline reinforcement learning, imitation learning, and inverse reinforcement learning
10. Applications of reinforcement learning: Locomotion control, game AI, automated system configuration
Literature
See itslearning for syllabus lists and additional literature references.
Examination regulations
Exam element a)
Timing
Tests
Portfolio
EKA
Assessment
Grading
Identification
Language
Examination aids
ECTS value
Additional information
The portfolio exam consists of written assignments: two home assignments and one project. The home assignments need to be done individually by each student. The projects can be done either individually or in groups of maximum two students. The individual tasks and the project are assessed together.
Indicative number of lessons
Teaching Method
- Intro phase (42 hours) consists of theoretical lectures where concepts, theories and models are introduced and put into perspective.
- Training phase (14 hours) consists of exercise sessions where students train their skills through programming exercises.
- Study phase consists of reading the course material, doing the assignments, and doing the project. In this phase, students gain academic, personal, and social experiences that consolidate and further develop their scientific proficiency. Focus is on immersion, understanding, and development of collaborative skills.