Reproducibility data analysis with R

Dates

28-31 October 2024

 

To foster international participation, this course will be held online

 

 

Course overview

With considerable effort, you wrote R code to analyze your data and generate a final document or report to present the results. You then give the code to a colleague for them to evaluate it, they don’t know which packages they need to install, they don’t have the necessary data and they don’t know if the correct script is report_final.Rmd or report_final_final.Rmd. Even after your guidance, they find that the code does not run on their machine. Worse, when you try to run it yourself a few weeks later, you realize that a package update "broke" your code and now it doesn't run.
This course will help you avoid those issues. You will learn how to organize a project to speed up collaboration and maximize its reproducibility by leveraging existing tools in the R ecosystem --such as RMarkdown, renv, and others--, version control and working environments.

 

 

Target audience and assumed background

This course is intended for researchers, data scientists, and anyone who uses R to generate documents and who wants to collaborate with other people (or themselves in the future) with the minimum amount of pain possible.
Basic prior experience with R is recommended. If you have ever read data and generated a graph or table based on it, you have everything you need to participate.

 

Learning outcomes

By the end of this course, participants will be able to:


- Create an R project that outputs a reproducible document.
- Create and manage a reproducible environment that specifies packages and their versions.
- Track changes with git.
- Collaborate with others and themselves with GitHub.
- Create and publish containers.

Program

Daily schedule:
9 am-12 pm (Berlin time): live lectures, live coding, live exercises.

Asynchronous homework support via Slack.

 

Monday– Classes from 9 AM - 12 PM Berlin time
  • Introduction to reproducibility
  • RStudio projects
    • Folder structure
    • R Package structure
  • R markdown.
    • Sintaxis.
    • Using templates from rticles
    • Using LaTeX templates 

Tuesday– Classes from 9 AM - 12 PM Berlin time
  • here package
  • Git and GitHub
    • Setup and basic ideas
    • Basic workflow (add, commit)
    • Collaboration (forks, pull requests)
    • Repo documentation (README, Licence, code of conduct).
 
Wednesday– Classes from 9 AM - 12 PM Berlin time
  • Managing dependencies with renv
  • Sharing data
    • Data repositories (DOI and details)
    • Access to data from code
 
Thursday– Classes from 9 AM - 12 PM Berlin time
  • Introduction to Containers
  • Docker
    • Create a container with a Dockerfile
    • Docker + renv
    • Publish container on dockerhub

 

Instructors

Paola Corrales: Paola has a PhD in Atmospheric Science from Universidad de Buenos Aires. During her PhD she applied data assimilation techniques to improve the representation of mesoscale convective systems and associated precipitation. She has experience working with Numerical Weather Prediction models using HPC systems and programming languages such as R, bash, and Fortran. She is an active R user and developer and contributes to many communities of practice, such as R-Ladies and rOpenSci. Since 2021, Paola holds a professor position at Universidad Nacional Guillermo Brown where she teaches Visualization of Information, and Data Management. In 2023 Paola became a member of The Carpentries Board of Directors.

More information about Paola: https://paocorrales.github.io

 

 

Elio Campitelli:  Elio Campitelli has a Ph.D. from Universidad de Buenos Aires in atmospheric sciences, where they studied the large-scale circulation of the Southern Hemisphere and now studies tropical influences on Antarctic sea ice at Monash University. They also taught Introduction to Programming, and Visualization of Information at Universidad Nacional Guillermo Brown and is a The Carpentries certified instructor. They are an active member of the R community, and maintains several open-source R packages (e.g., ggnewscale; metR).

More information about Elio: https://eliocamp.github.io

 


Cost overview

 

Package 1

 

450 €


Should you have any further questions, please send an email to info@physalia-courses.org

Cancellation Policy:

 

> 30  days before the start date = 30% cancellation fee

< 30 days before the start date= No Refund.

 

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.