Monday – 2 to 8 pm Berlin time
Part 1 – Introduction to genomic data (E. Jensen)
In this first session we will have an opportunity to get to know each other and the variety of research interests in the group. We will also explore different aspects of a population genetic
study, including best practices around sampling, quality of DNA samples, and the various methods to generate genetic data for your species of interest. Finally you will be introduced to the
various file types you will encounter when conducting studies that use NGS to generate genotype data.
Lecture breakdown
• Introduce yourself and your research interests
• Discuss study design: DNA sample sources, quality of DNA, sample sizes
• Discuss genetic data collection methods: whole genome sequencing, capture (exon or mtDNA), RADseq, SNP panel
• Introduce various file types (fastq, fasta, SAM/BAM, vcf, BED, program specific inputs)
Practical
• Explore fastq sequence files typically generated by sequencing software
• Check quality of fastq files using FastQC
• Trim data to remove adapters, poor quality data
Part 2 – Alphabet soup (E. Jensen)
In this second half of day one you will learn how to assemble your raw sequence data to a reference genome, and learn how to assess quality of your assembly. You will be introduced to de novo
approaches for studies where your species does not have a reference genome available. Finally, you will learn the basics of SNP genotype calling, and haplotype calling.
Lecture breakdown
• Accessing reference genomes, understanding their quality, “in-group” reference bais
• General introduction to de novo approaches
• General introduction to the idea of SNP calling
Practical
• Examine the files associated with a reference genome, and index it
• Assemble reads to the genome using BWA
• Check the quality of assembly, filter the data using BAM
• Call SNPs using BCFtools mpileup/call, generate a VCF output file of genotypes
Tuesday –2 to 8 pm Berlin time
Part 1: Data filtering (E. Jensen)
Day two will begin with an important step in genotype generation: data filtering! You will learn about the different options that can be used to filter your data to ensure the best quality data
for downstream population genetic analyses. You will learn about the trade-offs of the filters, and how best to make decisions around data filtering.
Lecture breakdown
• SNP filtering: basic filters, importance of ordering filters, iterative process
• Selecting thresholds for missing data per individual, per population, per locus
Practical
• Filtering dataset using VCFtools
• LD with PLINK
• Generating summaries of missingness, depth etc.
Part 2: Population structure analyses I (C. Cullingham)
Now that we have a genotype dataset, we can start with analyses that are important for conservation. We will first begin with population structure as it plays an important role in defining
conservation units for species. We will start with understanding the basics of Hardy-Weinberg for those who do not have a strong foundation in population genetics, then we will talk about the
different methods of detecting population structure with large genotype datasets.
Lecture breakdown
• Data quality assessment
• Importance of population structure in conservation studies
• Pros/cons of the different methods
• Best practices for analysis and reporting for each of the methods
Practical
• Examine case data using PCA
• Examine case data with DAPC
• Begin a PopCluster run for the case data on your local computers
Wednesday – 2 to 8 pm Berlin time
Part 1: Population structure analyses II (C. Cullingham)
There are many different programs we can use to examine population structure, we will briefly discuss different non a priori methods, and best practices for analyzing population structure. We
will focus on the programs STRUCTURE and PopCluster for this session.
Lecture breakdown
• What clustering software are available
• How do we choose the best model of population structure
• What are the weaknesses of these methods
Practical
• Working with your peers, examine outputs from STRUCTURE for a test dataset and present your findings to the entire group
• Examine the outputs from PopCluster and STRUCTURE for the case data to determine the most likely number of populations
Part 2: Adaptation (C. Cullingham)
With access to genotype data across genomes, conservation studies are now incorporating information on local adaptation into conservation planning. For this session, we will explore the
population genetic theory underlying the methods of detecting putative adaptation, and you will be introduced to different methods of detecting putative selective adaptation. We will also talk
about false positives, and how we need to interpret these data with caution.
Lecture breakdown
• Why conservation studies may want to assess signals of adaptation
• Methods of detecting putative adaptation
o Outlier detection
o Environmental correlations
Practical
• Run test data using outflank, PCAdapt and RDA
• Interpret outputs, compare neutral vs. adaptive population structure
Thursday – 2 to 8 pm Berlin time
Part 1: Genetic diversity, inbreeding, and estimating effective population size (C. Cullingham)
Maintaining genetic diversity in species of conservation concern is important, and understanding standing variation can give us insight into the genetic health of a species. In this session we
will talk about different ways to estimate genetic diversity, and inbreeding. We will then discuss the importance of effective population size, and the different metrics that are used to estimate
it.
Lecture breakdown
• Why is genetic diversity important
• How can we estimate genetic diversity and inbreeding
• What is effective population size
• Discuss different methods of estimation, and advantages/disadvantages of them
Practical
• Estimate observed and expected heterozygosity for each individual and per locus
• Plot the data using a histogram to visualize the overall patterns and identify individuals with very low levels of heterozygosity, examine the data for runs of homozygosity
and compare with heterozygosity estimates
• Estimate effective population size on a subset of loci and compare different metrics
Part 2: Relatedness & parentage (E. Jensen)
Relatedness is an important measure, especially in captive breeding programs for species of conservation concern. In this session we will learn about the different measures of relatedness that we
can calculate using genotypic data and explore ways we can use relatedness in conservation work.
Lecture breakdown
• Introduce the variety of estimators
• Importance of having an independent estimate of allele frequencies.
• Uses for relatedness (relatedness by distance, checking power of SNP panel, captive breeding).
• Parentage analysis
Practical
• Using the R package “related” https://rdrr.io/rforge/related/man/related-package.html
• Analyze test data (estimate relatedness, selecting the best estimator, simulating relatedness categories)
• Plotting relatedness distributions
• Assigning parents to offspring using Colony