add a new blog post on a data analysis workflow with git-worktrees and datalad #7

just-meng · 2025-12-09T17:50:45Z

No description provided.

…d datalad

yarikoptic

This is wonderful! Thank you for composing

I left some initial comments... the plane is taking off, will try to review more later.

Cheer

yarikoptic · 2025-12-12T13:51:14Z

content/posts/git-worktree-workflow/index.md

+
+At the recent [Distribits](https://www.distribits.live/events/2025-distribits/) meeting, I shared my struggle with [Yarik](https://github.com/yarikoptic), and confessed that I never fully understood the [YODA principle](https://handbook.datalad.org/en/latest/basics/101-127-yoda.html) - I was on the way, but I couldn't see clearly where I was heading. Together, we've found a sweet spot that gives me both **fast iteration during development** and **clean reproducibility for batch processing** - all without duplicating data or rebuilding containers. The secret? [Git worktrees](https://git-scm.com/docs/git-worktree) combined with [DataLad](https://handbook.datalad.org/en/latest/index.html)'s nested datasets.
+
+### When YODA's wisdom grows on trees


I felt that the jump to details is a bit too sudden for those who aren't familiar with yoda. Might be worth adding a sentence or two on work trees and composition of datasets via submodules here as a principle of modularity in Yoda principles

May be even pointing to other examples like datasets in OpenNeuroDerivatives to reattest that it is somewhat "common" and not that scary

yarikoptic · 2025-12-12T13:54:41Z

content/posts/git-worktree-workflow/index.md

+At the recent [Distribits](https://www.distribits.live/events/2025-distribits/) meeting, I shared my struggle with [Yarik](https://github.com/yarikoptic), and confessed that I never fully understood the [YODA principle](https://handbook.datalad.org/en/latest/basics/101-127-yoda.html) - I was on the way, but I couldn't see clearly where I was heading. Together, we've found a sweet spot that gives me both **fast iteration during development** and **clean reproducibility for batch processing** - all without duplicating data or rebuilding containers. The secret? [Git worktrees](https://git-scm.com/docs/git-worktree) combined with [DataLad](https://handbook.datalad.org/en/latest/index.html)'s nested datasets.
+
+### When YODA's wisdom grows on trees
+```


Could you also add a sentence or two on the opening on what your study is about and what you are aiming here at (preprocessing vs eg paper figures)

yarikoptic · 2025-12-12T13:57:36Z

content/posts/git-worktree-workflow/index.md

+│   └── 04_dataframes/
+└── ...
+```
+The active work in my project happens in the 'derived/L5b' subdataset that consumes raw data as inputs and produces multiple intermediate outputs, e.g. '01_suite2p' ... '04_dataframes'. The subdataset 'code' is a pure git repo and I use [Jujutsu](https://docs.jj-vcs.dev/latest/) for active development (because it's such a beautiful tool). After a rather nerve-wracking experience of mixing datalad with jj - it was like Schrödinger's cat, everything was simultaneously staged and unstaged 😱 - I restrict jj usage to 'code', and continue using datalad for managing all subdatasets with annexed content, orchestrating across nested datasets as a whole, and for captureing provenance of all derived data and figures with the datalad (containers-) run command.


This is such a wonderful example of mixing different tech (git annex and datalad , with jj) which is built on top of the same (git) foundation using that foundation basic constructs (repos, commits, submodules) ... I wonder if we could add some kind of message here too for people to use such core text instead of creating new and ad hoc stuff

content/posts/git-worktree-workflow/index.md

yarikoptic · 2025-12-12T16:11:19Z

content/posts/git-worktree-workflow/index.md

+HEAD detached from refs/heads/runs
+nothing to commit, working tree clean
+```
+Let's update that branch and sew the 'HEAD' back to 'runs'. There are multiple ways to do that, I use jj:


Suggested change

Let's update that branch and sew the 'HEAD' back to 'runs'. There are multiple ways to do that, I use jj:

Let's update that branch and sew the 'HEAD' back to 'runs'. There are multiple ways to do that, including plain `git` commands, I use jj which I am more familiar with:

yarikoptic · 2025-12-12T16:15:11Z

content/posts/git-worktree-workflow/index.md

+  save (notneeded: 5)
+  unlock (ok: 28)
+```
+What happened? - Apparently, I've lost the '.venv' directory during the hard reset which causes "ModuleNotFoundError: No module named 'process2p'". määäh! This actually speaks for the use of a container. The problem with the container is that I have to rebuild it everytime I update my code ... annoying! I guess that's the trade-off between efficiency and reproducibility. To illustrate this 'highly complex' dilemma with Deepseek's smart-ass comment in a graph:


technically that's not true: you can use container as your "environment" and run outside script in it. You can even bind-mount that script over some version of that script which you might have already inside the container. So you can kinda have the best of both worlds... might want to adjust "problem statement" here ;)

Co-authored-by: Yaroslav Halchenko <[email protected]>

just-meng and others added 2 commits December 9, 2025 18:49

add a new blog post on a data analysis workflow with git-worktrees an…

47f1160

…d datalad

Update index.md

c931eee

yarikoptic changed the title ~~add a new blog post on a data analysis workflow with git-worktrees an…~~ add a new blog post on a data analysis workflow with git-worktrees and datalad Dec 12, 2025

yarikoptic requested changes Dec 12, 2025

View reviewed changes

just-meng and others added 10 commits December 12, 2025 18:02

Update content/posts/git-worktree-workflow/index.md

15041d5

Co-authored-by: Yaroslav Halchenko <[email protected]>

Update content/posts/git-worktree-workflow/index.md

dbd37f4

Co-authored-by: Yaroslav Halchenko <[email protected]>

Update content/posts/git-worktree-workflow/index.md

573541b

Co-authored-by: Yaroslav Halchenko <[email protected]>

Update content/posts/git-worktree-workflow/index.md

d15a716

Co-authored-by: Yaroslav Halchenko <[email protected]>

Update content/posts/git-worktree-workflow/index.md

5c1a8ab

Co-authored-by: Yaroslav Halchenko <[email protected]>

Update content/posts/git-worktree-workflow/index.md

57fb321

Co-authored-by: Yaroslav Halchenko <[email protected]>

Update content/posts/git-worktree-workflow/index.md

ad891b5

Co-authored-by: Yaroslav Halchenko <[email protected]>

Update content/posts/git-worktree-workflow/index.md

2fb5fe0

Co-authored-by: Yaroslav Halchenko <[email protected]>

replace all runs with 'runs'

de2cb4e

Update content/posts/git-worktree-workflow/index.md

dd0adc4

Co-authored-by: Yaroslav Halchenko <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add a new blog post on a data analysis workflow with git-worktrees and datalad #7

add a new blog post on a data analysis workflow with git-worktrees and datalad #7

Uh oh!

just-meng commented Dec 9, 2025 •

edited by yarikoptic

Loading

Uh oh!

yarikoptic left a comment

Uh oh!

yarikoptic Dec 12, 2025

Uh oh!

yarikoptic Dec 12, 2025

Uh oh!

yarikoptic Dec 12, 2025

Uh oh!

yarikoptic Dec 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yarikoptic Dec 12, 2025

Uh oh!

yarikoptic Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		At the recent [Distribits](https://www.distribits.live/events/2025-distribits/) meeting, I shared my struggle with [Yarik](https://github.com/yarikoptic), and confessed that I never fully understood the [YODA principle](https://handbook.datalad.org/en/latest/basics/101-127-yoda.html) - I was on the way, but I couldn't see clearly where I was heading. Together, we've found a sweet spot that gives me both fast iteration during development and clean reproducibility for batch processing - all without duplicating data or rebuilding containers. The secret? [Git worktrees](https://git-scm.com/docs/git-worktree) combined with [DataLad](https://handbook.datalad.org/en/latest/index.html)'s nested datasets.

		### When YODA's wisdom grows on trees

	Let's update that branch and sew the 'HEAD' back to 'runs'. There are multiple ways to do that, I use jj:
	Let's update that branch and sew the 'HEAD' back to 'runs'. There are multiple ways to do that, including plain `git` commands, I use jj which I am more familiar with:

add a new blog post on a data analysis workflow with git-worktrees and datalad #7

Are you sure you want to change the base?

add a new blog post on a data analysis workflow with git-worktrees and datalad #7

Uh oh!

Conversation

just-meng commented Dec 9, 2025 • edited by yarikoptic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

yarikoptic Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

yarikoptic Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

yarikoptic Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

yarikoptic Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yarikoptic Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

yarikoptic Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

just-meng commented Dec 9, 2025 •

edited by yarikoptic

Loading