Machine and Deep Learning methods in population genomics and phylogeography

Dates

31 March-3 April 2025

Due to the COVID-19 outbreak, this course will be held online

 

Course overview

In recent years, machine and deep learning techniques are increasingly being used in evolutionary studies due to their flexible and data-hungry nature, suitable to analyze large and complex genomic datasets. This course will focus on using deep learning, specifically Convolutional Neural Networks (CNN), to extract information from genetic data for population genomics and phylogeography inference. The theoretical background for simulating genetic data and developing machine and deep learning architectures will be covered and followed by practical examples, in modules structured over four days. On the first day, the participants will learn how to simulate genetic data under competing demographic scenarios and use ABC for their inference. Day 2 will include an introduction to machine learning and its applications to evolutionary genomics. In Day 3, deep learning will be introduced and used to compare the demographic scenarios conceived in previous days. Day 4 will be dedicated to the simulation of genomic regions with selective sweeps and using CNN to detect such regions on real genomes. The course is structured to include lectures with discussions of key concepts and practical hands-on sessions, contextualised with research study cases.

Target audience and assumed background

The course is aimed at graduate students, researchers and professionals interested in genetics, evolution and deep learning, interested in developing applications to test explicit demographic hypotheses and search for selective sweeps. The course will include both general concepts of genetic data simulations and deep learning but will also include more advanced discussion on advanced details on their internal machinery. The examples discussed during the course will span datasets for both model organisms, for which whole genomes are available, and non-model organisms with less available information.

 

Program

 

Monday – Classes from 2 to 8 pm Berlin time

- Introduction to coalescent theory and how to model genetic diversity

- How to choose summary statistic and use them in a simple Approximate Bayesian Computation (ABC) framework

- Practical: building a script to simulate genetic data under competing demographic scenarios and perform a simple ABC analysis. 

 


Tuesday – Classes from 2 to 8 pm Berlin time

- A gentle introduction to Machine Learning: supervised vs unsupervised learning, regression and classification 

- Simple Machine Learning approaches with summary statistics from genomic data

- Practical: Demographic inference with a simple Machine Learning architecture and summary statistics 



Wednesday – Classes from 2 to 8 pm Berlin time 

- Understanding the basic CNN architecture for image recognition

- Using CNN to learn directly from genetic data 

- Practical: Comparing demographic scenarios with deep learning 



Thursday – Classes from 2 to 8 pm Berlin time

- Introduction to approaches for detecting selection

- Recognizing signatures of selection with deep learning 

- Practical: simulating genetic data and using CNN to predict whether a given locus is under selection 

 

 


Cost overview

 

Package 1

 

480 €


Should you have any further questions, please send an email to info@physalia-courses.org

Cancellation Policy:

 

> 30  days before the start date = 30% cancellation fee

< 30 days before the start date= No Refund.

 

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.