Machine Learning Methods for Longitudinal Data with Python

Machine Learning Methods for Longitudinal Data with Python

Dates

6th-9th May 2025

Where

To foster international participation, this course will be held online

Topic: Machine learning for sequence data with time and causation

Course overview

This course will introduce methods and approaches to analyse data, chiefly longitudinal (sequence) data (repeated in time/space), when time and the cause-effect relationship matter. Time and causation pose specific challenges in all aspects of processing and analysis, from visualization to exploratory data analysis, to modelling and validation, to the interpretation of results. The course will outline the main challenges related to dealing with time and causation when analysing (sequence) data: first, this will be done briefly from the classical statistical perspective; then, more extensively, from the machine learning perspective. Time and causation need special attention also with the resolution of biases in the data and results: e.g. confounding, colliding and mediator bias, the disentangling of cause-effect relations. Specific areas that will be covered include modelling of sequence data, forecasting (prediction of time-series data), survival analysis, graph models, Bayesian networks, machine learning algorithms, epidemiology and gene-expression experiments.

Format

The course is structured in modules over four days. The first two days will mostly cover the basic concepts and the classical statistical perspective; the last two days will be devoted to the machine learning approach. Each day will include lectures with class discussions of key concepts and practical hands-on sessions with collaborative exercises where students will interact with the whole class and instructors to apply the acquired skills. After and during each exercise, results will be interpreted and discussed. At the end of the course, a quiz will be taken together to recap and highlight the most important concepts covered, and there will be room to discuss specific research problems and questions from participants.

Target audience and assumed background

The course is aimed at advanced students, researchers and professionals interested in learning how to deal with time and causation in sequence data, and how to analyze them in the context of real life applications in biology. It will include information useful for both absolute beginners and more advanced users willing to delve into some aspects of the implementation of longitudinal models and scripting code. We will start by introducing the general concepts and approaches to deal with sequence data in the presence of time and cause-effect relationships; we will then explore applications to specific scientific domains (e.g. forecasting, epidemiology, gene expression) and extensions to machine learning methods.
Attendees are expected to have a background in biology and the research problems involving prediction, inference, pattern discovery; previous exposure to inferential and predictive experiments would be beneficial. There will be a mix of lectures and hands-on practical exercises using Python, Markdown/Jupyter Notebooks and the Linux command line. Some basic understanding of Python programming and of the Linux environment will be advantageous, but is not required.

Learning outcomes

At the end of the course the student will have an understanding of:

- how to recognise and treat spatial and temporal dependencies in the data
- how to disentangle cause-effect relationships in the data
- the most common methods to analyse data with time and/or cause components
- methods and principles of machine learning for sequence data
- specific applications to life-science domains like epidemiology and gene expression experiments
- how to design, analyse and interpret scientific experiments with time and cause components

Program

Day1– Classes from 2-8 PM Berlin time

- Sequence data: examples and challenges
- Time is pervasive; cause-effect relations are tricky
- The classical statistical perspective
- Confounding, colliding, mediator biases
- Statistical models to analyse data with repeated records over time (multiple time points) and space (multiple locations)

Day2– Classes from 2-8 PM Berlin time

- Graph models and Bayesian networks
- Cross-validation with temporal, spatial and cause data structures
- The machine-learning perspective: predicting time series, performance metrics)

Day3– Classes from 2-8 PM Berlin time

- A primer on longitudinal data in epidemiology: times series of disease incidence/prevalence, survival analysis)
- Imputation of missing data with time/space/cause dependencies (RFi, KNNi, etc.)
- More ML: deep Learning and Transformer Models for the analysis of sequence data

Day4– Classes from 2-8 PM Berlin time

- Analysis of residuals and model diagnostics
- Case study. Multi-omics analysis: a study in interpretability on HeLa Cell Cycling for integration of mRNA, Translation Data and Proteomics: from raw data to final insights
- Final recap quiz
- Discussing your own research problems and wrap-up discussion

Instructors

Dr. Filippo Biscarini

Dr. Nelson Nazzicari

Cost overview

Package 1

480 €

Register now

Should you have any further questions, please send an email to info@physalia-courses.org

Cancellation Policy:

> 30 days before the start date = 30% cancellation fee

< 30 days before the start date= No Refund.

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.