DM887: Reinforcement learning

Study Board of Science

Teaching language: English
EKA: N340108102
Assessment: Second examiner: External
Grading: 7-point grading scale
Offered in: Odense
Offered in: Spring
Level: Master

STADS ID (UVA): N340108101
ECTS value: 10

Date of Approval: 30-10-2023

Duration: 1 semester

Version: Approved - active

Entry requirements


Academic preconditions

A Bachelor‘s degree in computer science, applied mathematics, or pure mathematics including programming skills.

Course introduction

Reinforcement learning is the algorithmic framework that enables a computer to learn a skill via rewarded interactions with its environment. Its theoretical basis lies at the core of many emerging technologies, such as manipulation robotics, autonomous driving, game development, portfolio management, and drug discovery.

This course introduces the basics of reinforcement learning from an artificial intelligence perspective. It starts with presenting the theoretical foundations of reinforcement learning, continues with the applications of its theory to major artificial intelligence problems,and concludes with the present grand challenges of the artificial intelligence research that are related to reinforcement learning.

The course content is designed to keep a balance between theory required to pursue self-driven specialization and practice required to convert the theory into working knowledge. The term project is an opportunity for the students to experience a guided version of the scientific research process.

Expected learning outcome

The learning objective of the course is that the student demonstrates the ability to:

  1. Following the recent methodological, algorithmic, and theoretical advances of the reinforcement learning field from their direct resources, such as academic publications, textbooks, and technical presentations of domain experts,
  2. Designing and implementing original reinforcement learning algorithms tailored to new problems,
  3. Conducting formal analysis to characterize the computational footprint and predictive performance of a given reinforcement learning algorithm.

The course builds on the knowledge acquired in basic machine learning or data modeling courses such as DM581 Introduction to Machine Learning, DM566/DS804 Data Mining and Machine Learning, DM568/DM873/DS809 Deep Learning, DS807 Applied Machine Learning, ST811 Multivariate Statistical Analysis, or ST819 Multivariate Regression Analysis. The course gives an academic basis for doing bachelor’s and master’s projects, as well as for other master’s level courses on advanced artificial intelligence subjects.


The following main topics are contained in the course:

1. Markov decision processes: Basic concepts and properties

2. Dynamic programming and its approximations: Temporal difference methods: and Q-learning

3. Policy-based and actor-critic methods: REINFORCE, MaxEnt, and deterministic policy gradient methods

4. Bandit algorithms: Multi-armed bandits, stochastic bandits, contextual bandits

5. Theoretical analysis of reinforcement learning algorithms

6. Bayesian methods for deep learning: Approximate Bayesian inference of neural networks

7. Modeling environment dynamics: Probabilistic deep state-space models

8. Model-based reinforcement learning: Model predictive control and Dyna algorithms

9. Offline reinforcement learning, imitation learning, and inverse reinforcement learning

10. Applications of reinforcement learning: Locomotion control, game AI, automated system configuration


Lecture notes and/or slides prepared by the instructor (every week).
See itslearning for syllabus lists and additional literature references.

Examination regulations

Exam element a)








Second examiner: External


7-point grading scale


Full name and SDU username


Normally, the same as teaching language

Examination aids

Allowed, a closer description of the exam rules will be posted i itslearning.

ECTS value


Additional information

The portfolio exam consists of written assignments: two home assignments and one project. The home assignments need to be done individually by each student. The projects can be done either individually or in groups of maximum two students. The individual tasks and the project are assessed together.

Indicative number of lessons

56 hours per semester

Teaching Method

At the faculty of science, teaching is organized after the three-phase model ie. intro, training, and study phase.
  • Intro phase (42 hours) consists of theoretical lectures where concepts, theories and models are introduced and put into perspective.
  • Training phase (14 hours) consists of exercise sessions where students train their skills through programming exercises.
  • Study phase consists of reading the course material, doing the assignments, and doing the project. In this phase, students gain academic, personal, and social experiences that consolidate and further develop their scientific proficiency. Focus is on immersion, understanding, and development of collaborative skills.

Teacher responsible

Name E-mail Department
Melih Kandemir Concurrency


Administrative Unit

Institut for Matematik og Datalogi (datalogi)

Team at Educational Law & Registration


Offered in


Recommended course of study

Profile Education Semester Offer period

Transition rules

Transitional arrangements describe how a course replaces another course when changes are made to the course of study. 
If a transitional arrangement has been made for a course, it will be stated in the list. 
See transitional arrangements for all courses at the Faculty of Science.