ReproducibleQuantitativeDataScience

A course prepared by Dr Melanie Ganz and Dr Cyril Pernet, with amazing guests: Dr Robert Oostenveld (RO), Dr Michael Hanke (MH), Dr Nikola Stikov (NS) and Prof Russ Poldrack. The course structure is over 5 days plus personal work: 2 days, course work, 2 days, course work, and 1 day with presentations.

During the course, active participation is expected. In session 1, we'll use padlet to interact with each other (anonymous posting allowed) and also do group work. In session 2, we use GitHub (that you learn in session 1) to share code and review each other code. It is recommended to share something you are working on, but if you feel uncomfortable with that, prepare something to be shared/reviewed. In session 3, you must present in front of everybody. While it may feel uncomfortable, it is expected from any PhD student to be able to do so, and not just for this course. In general, there are no rights and wrongs in trying to improve reproducibility, it is only expected that you try given the conceptual and practical tools presented.

Part 1

Day 1 - Data Collection and data storage

Introduction to reproducibility: Definitions and origins
How do you store data on your computer? Data structures and data naming
Data provenance: keeping track of where data are coming from
Reproducibility is hard: case studies

Day 2 - Reproducible designs, protocols and pre-registration

Concepts and tools for protocol documentation, and study pre-registration
Data Privacy, Ethic and GDP - lecture and practical case reviews
Using markdown see cheat sheet for documentation - practical
Version control and social coding with Git -- people who know can pair wih newbies

Course work

Using your PhD research data, protocol, code, etc, write a report explaining from where you start, and which measures are already in place to increase reproducibility as per concepts presented during days 1 and 2. What measures can be taken to increase reproducibility and if any, why some cannot be implemented? (page count 2 to 3)

Submit your coursework via e-mail to Cyril and Melanie.

Part 2

Day 3 - Better coding

Programming
Good coding practices
An introduction to computational analysis methods: permutation, bootstrap, cross-validation, out-of-sample generalization
Test-driven AI coding: https://github.com/poldrack/ai_testing#

Day 4 - Better analyses

Feedback on coursework and discuss further issues to make your PhD reproducible and next assignment
P-hacking
Understanding p-values
Computational reproducibility

Please prepare before the course:

Installing UV
Install the gitannex typing in a terminal uv tool install git-annex and then uv tool install datalad --with datalad-next --with datalad-container (or if datalad was installed uv tool upgrade datalad --with datalad-next --with datalad-container). Finally make sure to activate a uv-based DataLad installation: on Mac/Linux: source $(uv tool dir)/datalad/bin/activate, on Windows (cmd.exe) AppData\Roaming\uv\tools\datalad\Scripts\activate.bat. Further checking and instructions can also be found here.
You should already have VSCode from the last session; otherwise, install it with Copilot AI.
Clone this repository: git clone https://github.com/poldrack/ai_testing.git and run uv sync with repo directory
install docker on your own machine so you can use a container and then build a container.

Course work

Improve code you are using based on the concepts and tools reviewed over the 4 days: from version control and better inline documentation, to functionalization and modern computational statistics.
Make a 10 minutes presentation summarizing all of your course works and what measures you have taken to improve reproducibility in your PhD (including work from session 1).

Part 3

Day 5 - Data sharing

The ‘data’ cycle, sharing from raw data to figures - lecture
Copyright and Open Access in publishing - Lecture by Rasmus Rindom Riise from the Royal Library
Reproducible publishing see example here - Lecture by Nikola Stikov from University of Montreal (remote)
Presentations and discussions/social event (incl. drinks and pizza)

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
code_examples		code_examples
coursework/coursework_part1		coursework/coursework_part1
literature		literature
markdown		markdown
naming_files		naming_files
p_values		p_values
provenance		provenance
slides		slides
version_control		version_control
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReproducibleQuantitativeDataScience

Part 1

Day 1 - Data Collection and data storage

Day 2 - Reproducible designs, protocols and pre-registration

Course work

Part 2

Day 3 - Better coding

Day 4 - Better analyses

Course work

Part 3

Day 5 - Data sharing

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

melanieganz/ReproducibleQuantitativeDataAnalysis-2025

Folders and files

Latest commit

History

Repository files navigation

ReproducibleQuantitativeDataScience

Part 1

Day 1 - Data Collection and data storage

Day 2 - Reproducible designs, protocols and pre-registration

Course work

Part 2

Day 3 - Better coding

Day 4 - Better analyses

Course work

Part 3

Day 5 - Data sharing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages