DM870: Data mining and machine learning

The Study Board for Science

Teaching language: Danish or English depending on the teacher, but English if international students are enrolled
EKA: N340033102
Assessment: Second examiner: External
Grading: 7-point grading scale
Offered in: Odense
Offered in: Spring
Level: Master

STADS ID (UVA): N340033101
ECTS value: 10

Date of Approval: 05-10-2022


Duration: 1 semester

Version: Approved - active

Comment

The course is co read with: DM868 and DS804

Entry requirements

The course cannot be chosen by students who: have either followed, or have passed DM555, DM855, DM859, DM566, DM868, or DS804.

Academic preconditions

Students taking the course are recommended to:

  • Have knowledge of the basic concepts of discrete methods for computer science
  • Have knowledge oft the basic concepts of linear algebra
  • Have knowledge of basic algorithms and data structures
  • Be able to program

Course introduction

The aim of the course is to enable the student to choose and use techniques from Data Mining and Machine Learning, which is important in regard to being able to analyze large datasets in many financial, medical, commercial, and scientific applications.

Data Mining and Machine Learning techniques enable computational systems to identify meaningful patterns in the data and to adaptively improve their performance with experience accumulated from the observed data.

This course introduces the most common techniques for performing basic data mining and machine learning tasks, and covers the basic theory, algorithms, and applications. This course balances theory and practice, and covers the mathematical as well as the heuristic aspects. Computational learning methods are introduced at a general level, with their basic ideas and intuition. Moreover, the students have the opportunity to experiment and apply data mining and machine learning techniques to selected problems.

The course gives an academic basis for conducting large scale data analysis and for conducting bachelor and master thesis projects as well as other practical oriented study-activities, that are part of the degree.

In relation to the competence profile of the degree it is the explicit focus of the course to:
  • Give knowledge of common data mining and machine learning tasks and methods
  • Give skills to apply common data mining and machine learning methods to real world problems
  • Give the competence to design data mining and machine learning methods
  • Give knowledge to understand and reflect on theories, methods, and practices in the computer science field
  • Give skills to acquire new knowledge in an effective and independent manner and be able apply this knowledge in a reflective way
  • Give skills to describe, analyze and solve computer science problems applying methods and modeling formalisms from the core area and its mathematical support disciplines
  • Give skills in analyzing the advantages and disadvantages of various algorithms, especially in terms of resource consumption
  • Give skills to make and justify professional decisions
  • Give skills to describe, formulate and communicate issues and results to peers, non-specialists, project partners and users.

Expected learning outcome

The learning objectives of the course are that the student demonstrates the ability to:
  • Describe the data mining and machine learning tasks presented during the course
  • Describe the algorithms and methods presented in the course
  • Describe the topics presented in the course in precise mathematical language
  • Explain the individual steps of the mathematical derivations presented in class
  • Apply the methods to simple problems
  • Apply the methods to situations different from the ones presented in class
  • Reflect on and assess design choices for data mining and machine learning systems
  • Undertake experimental evaluation of data mining and statistical learning methods and report the results

Content

The following main topics are contained in the course:
  • basic probability
  • theory of learning (feasibility of learning, generalization, overfitting)
  • error and noise
  • bias and variance
  • training vs. testing (cross-validation, bootstrap, model selection)
  • methods (for example rule learning, Bayes learning, nearest neighbor classification, decision trees, clustering)
  • frequent pattern mining (item set mining, association rules)

Literature

See itslearning for syllabus lists and additional literature references.

Examination regulations

Exam element a)

Timing

Spring and June

Tests

Portfolio and test

EKA

N340033102

Assessment

Second examiner: External

Grading

7-point grading scale

Identification

Full name and SDU username

Language

Normally, the same as teaching language

Duration

4 hours

Examination aids

Written examination:
All common aids are allowed e.g. books, notes, computer programmes which do not use internet etc. 

Internet is not allowed during the exam. However, you may visit system DE-Digital Exam when answering the multiple-choice questions. If you wish to use course materials from itslearning, you must download the materials to your computer the day before the exam. During the exam itslearning is not allowed.  

ECTS value

10

Additional information

Portfolio exam consists of:
  • Presentations in exercise classes. Counts 10% of the overall final assessment
  • Written exam during the exam period. Counts 90% of the overall final assessment
The re-exam is not portfolio but just the written exam (rules of the written exam unchanged).In case of 24 or fewer students signed up for re-exam in DM868, DM870 and DS804 (co-read courses), the re-exam will be in the form of an oral exam.

Indicative number of lessons

70 hours per semester

Teaching Method

At the faculty of science, teaching is organized after the three-phase model ie. intro, training and study phase.
  • Intro phase (lectures) - 40 hours
  • Training phase: 30 hours, including 30 hours tutorials 
In the intro phase, concepts, theories and models are introduced and put into perspective. In the training phase, students train their skills through exercises and dig deeper into the subject matter. In the study phase, students gain academic, personal and social experiences that consolidate and further develop their scientific proficiency. Focus is on immersion, understanding, and development of collaborative skills. 
 
Activities during the study phase:
  • Reading from textbooks
  • Solving homework
  • Applying acquired knowledge in practical projects

Teacher responsible

Name E-mail Department
Arthur Zimek zimek@imada.sdu.dk Data Science

Timetable

Administrative Unit

Institut for Matematik og Datalogi (datalogi)

Team at Educational Law & Registration

NAT

Offered in

Odense

Recommended course of study

Profile Education Semester Offer period

Transition rules

Transitional arrangements describe how a course replaces another course when changes are made to the course of study. 
If a transitional arrangement has been made for a course, it will be stated in the list. 
See transitional arrangements for all courses at the Faculty of Science.