KE561: Chemical and pharmaceutical data science

Study Board for Natural Sciences

Teaching language: Danish or English depending on the teacher, but English if international students are enrolled
EKA: N530068102
Assessment: Second examiner: None
Grading: Pass/Fail
Offered in: Odense
Offered in: Autumn
Level: Bachelor

STADS ID (UVA): N530068101
ECTS value: 5

Date of Approval: 29-04-2026


Duration: 1 semester

Version: Approved - active

Internal Course Code

KE561

Comment

The course is co-taught with FA516.

Entry requirements

The course cannot be taken by students who have previously followed or passed FA516.

Academic preconditions

The course builds on knowledge acquired during the first part of the bachelor programme in chemistry.

Students taking the course are expected to:

  • have general knowledge of chemistry, pharmacy and chemical data obtained from experimental or computational studies
  • have familiarity with programming concepts and basic scripting (e.g., through prior experience with R or similar languages)
  • have knowledge of basic statistics such as descriptive statistics, simple regression, and model evaluation concepts
  • have familiarity with basic linear algebra concepts such as vectors and matrices
  • be able to work with quantitative datasets, including tabular data and graphical representations

Course introduction

This course introduces data science methods used in modern chemical and pharmaceutical research. Students learn how chemical data are structured, curated, visualized, and analyzed using computational tools. Through hands-on exercises and project work, students apply statistical and machine learning methods to molecular and experimental datasets. The course emphasizes reproducible workflows and practical data analysis relevant to chemical and pharmaceutical applications.

Expected learning outcome

Knowledge

The student has knowledge of:

  • common types of chemical and pharmaceutical data and how they are represented computationally
  • principles of data cleaning, preprocessing, and visualization for scientific datasets
  • molecular representations used in cheminformatics such as descriptors, fingerprints, and graph-based representations
  • basic statistical learning and machine learning approaches for prediction and classification
  • the role of reproducible computational workflows in scientific data analysis

Skills
The student is able to:
  • use data science tools to analyze chemical and pharmaceutilcal datasets
  • prepare and preprocess datasets for analysis, including data cleaning and feature generation
  • visualize chemical and experimental data and interpret trends and patterns
  • implement and evaluate machine learning models for molecular property prediction
  • perform simple generative modeling tasks for molecules or chemical representations
  • document analyses and computational workflows in a reproducible notebook-based format

Competences
The student is able to:
  • apply data science methods to analyze and interpret chemical and pharmaceutical datasets
  • critically assess predictive models and their limitations when applied to chemical problems
  • collaborate in groups to design and implement a computational analysis workflow
  • communicate results through a structured report combining data analysis, code, and scientific interpretation

Content

The following main topics are included in the course:

  • introduction to chemical and pharmaceutical data science and typical data types
  • data handling and preprocessing using tabular chemical datasets
  • data visualization and exploratory data analysis
  • molecular representations and descriptors using cheminformatics tools
  • statistical learning methods for regression and classification
  • machine learning workflows and model evaluation
  • introductory deep learning approaches for molecular prediction
  • generative models for molecules or molecular representations
  • reproducible data analysis workflows
  • project-based analysis of chemical or pharmaceutical datasets

Literature

See itslearning for literature references.

Examination regulations

Exam element a)

Timing

Autumn

Tests

Projektrapport

EKA

N530068102

Assessment

Second examiner: None

Grading

Pass/Fail

Identification

Full name and SDU username

Language

Normally, the same as teaching language

Duration

The exam consists of a project carried out during the course and finalized as a written submission. No fixed duration is associated with the submission.

Examination aids

Allowed. a closer description of the exam rules will be posted i itslearning.

ECTS value

5

Additional information

Project exam based on a written submission in the form of a notebook containing code, documentation, analysis, and interpretation.

Format
Students work in groups of 3–4. Each group submits a single notebook containing:
  • dataset curation and preprocessing
  • exploratory data analysis
  • molecular representation or feature generation
  • implementation of at least one machine learning model
  • model evaluation and interpretation
  • background explanation and scientific discussion
  • The notebook must contain executable code and documentation and must run without modification, producing the reported results. This requirement ensures that the computational workflow is reproducible.

Assessment criteria
Assessment is based on the overall quality of the submitted work. To obtain a passing grade the submission must demonstrate:
  • functioning and reproducible code
  • correct data handling and preprocessing
  • appropriate use and evaluation of machine learning models
  • scientifically sound interpretation of the results
  • adequate background explanation and documentation of the analysis.

Re-examination
Re-examination consists of a revised version of the project submission, addressing identified deficiencies in the original report.

Indicative number of lessons

40 hours per semester

Teaching Method

Planned lessons:
Total number of planned lessons: 40
Hereof:
Common lessons in classroom/auditorium: 10
Common lessons in laboratory: 30

Describe briefly what happens during the planned lessons:

Introduction phase
Central concepts, methods, and tools used in chemical and pharmaceutical data science are introduced through lectures combined with short guided exercises. Approximately 10 lecture sessions are held, each consisting of a one-hour lecture followed by a one-hour supervised exercise session. These sessions introduce theoretical concepts, computational tools, and data analysis strategies.

Training phase
Students develop practical skills through computer-based exercises using Python and relevant data science libraries. Approximately 15 exercise sessions of two hours each are devoted to hands-on work in Jupyter notebooks, where students learn to perform data preprocessing, visualization, molecular representation, and machine learning analyses.

Study phase
Students work in groups on a project involving analysis of a chemical or pharmaceutical dataset. The project integrates data curation, model development, and interpretation of results in a reproducible computational workflow. The outcome is documented in a written report accompanied by executable code, allowing students to consolidate and apply the concepts and techniques learned during the course.

Other planned teaching activities:

The self-study spans the entire course. Students review lecture material, read relevant background literature, and work through supplementary exercises related to the computational topics introduced in class.

Preparation for the computer-based exercise sessions includes reviewing the relevant programming concepts and data science methods, and preparing Jupyter notebooks or code templates for the exercises. Students may also explore provided datasets and documentation for the computational tools used in the course.

Between sessions, students continue developing their programming and data analysis skills by extending the exercise notebooks, testing alternative models or parameters, and refining their workflows. Work outside the planned lessons also includes preparation of the final group project, including data analysis, documentation of code, and writing of the final report.

Teacher responsible

Name E-mail Department
Casper Steinmann Svendsen steinmann@sdu.dk Institut for Fysik, Kemi og Farmaci

Timetable

Administrative Unit

Fysik, kemi og Farmaci

Team at Registration

NAT

Offered in

Odense

Recommended course of study

Profile Education Semester Offer period