17-21 February 2025
To foster international participation, this course will be held online
The use of modern quantitative technologies to characterise complex phenomena represents the standard approach in almost every research domain. Biology
makes no exception and the use of multi-omics techniques (metabolomics, transcriptomics, genomics and proteomics) is pervasive in every facet of life sciences. The resulting multivariate datasets are
highly complex and advanced data analysis approaches need to be applied to optimize the use of the available information. For relatively large-scale studies, machine learning (ML) represents a valid
tool to complement classical multivariate statistical methods.
The objective of this course is to highlight the advantages and limitations of these data analysis approaches in the context of biological research, providing a broad hands-on introduction to the
use of multivariate methods and machine learning algorithms for the analysis of ‘omics datasets.
The syllabus has been planned for people who need an intuitive starter on the basic knowledge of theoretical and applied machine learning. Students are preferred -but not required- to have a foundational understanding of statistics and the R programming language.
Each session consists of a lecture of one-to-two hours followed by one-to-two hours of practical exercises/demonstrations. There will also be plenty of time for students to discuss their problems
and data.
Day 1 - 2-8 pm Berlin time
General Introduction
Data mining, -omics and machine learning
Hands-off introduction to ML / Omics meet ML
Introduction to advanced R data libraries
Introduction to tidymodels
Day 2- 2-8 pm Berlin time
Multivariate data: things to always remember
Model and variable selection: the machine learning paradigm
Supervised learning: regression and classification
Machine learning for regression problems
Day 3 - 2-8 pm Berlin time
Overfitting and resampling techniques
Classification problems
Regression and classification with tidymodels
Lasso-penalised linear and logistic regression
Lasso and model tuning
KNN imputation [optional]
Day 4- 2-8 pm Berlin time
Random Forest for regression and classification
Slow learning: the boosting approach
Unsupervised learning: PCA, Umap, Self-organizing maps
PCA demo
Day 5- 2-8 pm Berlin time
SVM demo
Unsupervised learning demos
UMAP demo
SOM demo
Final interactive exercise
Kahoot quiz: let’s test our machine learning skills!
Q&A
Cancellation Policy:
> 30 days before the start date = 30% cancellation fee
< 30 days before the start date= No Refund.
Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.