DM882: Text Mining
Study Board of Science
Teaching language: Danish or English depending on the teacher, but English if international students are enrolled
EKA: N340090102
Assessment: Second examiner: External
Grading: 7-point grading scale
Offered in: Odense
Offered in: Spring
Level: Master
STADS ID (UVA): N340090101
ECTS value: 5
Date of Approval: 01-11-2022
Duration: 1 semester
Version: Approved - active
Comment
Entry requirements
Academic preconditions
Students taking the course are expected to:
- Have basic knowledge in probability theory, e.g. by having followed DM566 (Data Mining and Machine Learning)
- Have basic knowledge in algorithmics, obtained e.g. by having followed DM507 (Algorithms and data structures)
- Have proficiency in programming, preferably Python, e.g. by having followed DM561 (Linear Algebra)
Course introduction
The aim of the course is to provide introduction to Text Mining of unstructured text in natural languages. Increasing amount of digitized text calls for development of formal frameworks to process such data to extract information and draw statistical conclusions based on its content. The course is designed to provide a sound theoretical basis in processing unstructured text and to provide example applications of such. We will start working with simple examples of unstructured text demonstrating the abilities of current Text Mining methods to highlight their advantages and shortcomings. We will then move to applications of such methods on more realistic datasets sourced from online news media and scientific publications. The content of this course is designed to give an applications context of computer science/data science methods handling real-world data.
In relation to the competence profile of the degree it is the explicit focus of the course to:
- Give knowledge of some of the main sources and representations of unstructured text.
- Give the competence to normalize unstructured text into suitable corpora for computational applications.
- Give understanding of methods such as Named Entity Recognition, Topic Detection or Sentiment Analysis.
- Give examples of applications of Text Mining methods, providing a ability to choose the right set of tools for a task.
- Provide a basis to plan and carry out Text Mining tasks starting from raw unstructured text and ending with a set of conclusions.
- Give understanding of the applications of theoretical computer science methods on real-world data.
Expected learning outcome
The learning objective of the course is that the student demonstrates the ability to:
- Understand some of the main types of unstructured text.
- Ability to manipulate unstructured text.
- Transform unstructured text into suitable normalized representation.
- Train and execute Named Entity Recognition Models.
- Train and execute Topic Detection Models.
- Train and execute Sentiment Analysis Models.
- Understand Machine Translation Models.
- Understand limitations of text mining methods based on the content such as non-English text (e.g. Danish or Mandarin).
- Perform statistical analysis on unstructured text.
- Understand the limits and drawbacks of Text Mining methods
- Ability to form hypotheses regarding unstructured text and pick the tools to check the hypotheses.
Content
The following main topics are contained in the course:
- Sources and formats of unstructured text.
- Normalization, representation and annotation of unstructured text into corpora.
- Named Entity Recognition Models
- Topic Detection Models
- Sentiment Analysis Models
- Machine Translation Models
- Supervised and unsupervised analysis of unstructured text.
Literature
Examination regulations
Exam element a)
Timing
Spring
Tests
Project
EKA
N340090102
Assessment
Second examiner: External
Grading
7-point grading scale
Identification
Full name and SDU username
Language
Normally, the same as teaching language
Examination aids
To be announced during the course.
ECTS value
5
Indicative number of lessons
Teaching Method
The teaching method is based on three phase model.
- Intro phase: 20 hours
- Skills training phase: 15 hours, hereof tutorials: 15 hours
Teacher responsible
Name | Department | |
---|---|---|
Konrad Krawczyk | konradk@imada.sdu.dk | Institut for Biokemi og Molekylær Biologi |
Timetable
Administrative Unit
Team at Educational Law & Registration
Offered in
Recommended course of study
Transition rules
Transitional arrangements describe how a course replaces another course when changes are made to the course of study.
If a transitional arrangement has been made for a course, it will be stated in the list.
See transitional arrangements for all courses at the Faculty of Science.