Skip to content

Commit ce9d7b5

Browse files
jorgeechristopher-hakkaartbentshermanpditommasomarcodelapierre
authored
Implementation of Git multiple revisions (#6620)
Co-authored-by: Chris Hakkaart <[email protected]> Co-authored-by: Ben Sherman <[email protected]> Co-authored-by: Paolo Di Tommaso <[email protected]> Co-authored-by: Dr Marco Claudio De La Pierre <[email protected]>
1 parent bee8cff commit ce9d7b5

26 files changed

+3136
-418
lines changed
Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
# Git Multi-Revision Asset Management with Strategy Pattern
2+
3+
- Authors: Jorge Ejarque
4+
- Status: Approved
5+
- Deciders: Jorge Ejarque, Ben Sherman, Paolo Di Tommaso
6+
- Date: 2025-12-05
7+
- Tags: scm, asset-management, multi-revision
8+
9+
## Summary
10+
11+
Nextflow's asset management system has been refactored to support multiple revisions of the same pipeline concurrently through a bare repository approach with shared object storage, while maintaining backward compatibility with legacy direct-clone repositories using the Strategy design pattern.
12+
13+
## Problem Statement
14+
15+
The original asset management system (`AssetManager`) cloned each pipeline directly to `~/.nextflow/assets/<org>/<project>/.git`, creating several limitations:
16+
17+
1. **No concurrent Git multi-revision support**: Only one revision of a pipeline could be checked out at a time, preventing concurrent execution of different versions
18+
2. **Update conflicts**: Pulling updates while a pipeline was running could cause conflicts or corruption
19+
3. **Testing limitations**: Users couldn't easily test different versions of a pipeline side-by-side
20+
21+
The goal was to enable running multiple revisions of the same pipeline concurrently (e.g., production on v1.0, testing on v2.0-dev) while maintaining efficient disk usage through object sharing.
22+
23+
## Goals or Decision Drivers
24+
25+
- **Concurrent multi-revision execution**: Must support running different revisions of the same pipeline simultaneously
26+
- **Efficient disk usage**: Share Git objects between revisions to minimize storage overhead
27+
- **Backward compatibility**: Must not break existing pipelines using the legacy direct-clone approach
28+
- **API stability**: Maintain the existing `AssetManager` API for external consumers (K8s plugin, CLI commands, etc.)
29+
- **Minimal migration impact**: Existing repositories should continue to work without user intervention
30+
- **JGit compatibility**: Solution must work within JGit's capabilities to avoid relying on Git client installations
31+
- **Atomic updates**: Downloading new revisions should not interfere with running pipelines
32+
33+
## Non-goals
34+
35+
- **Migration of existing legacy repositories**: Legacy repos continue to work as-is; no forced migration
36+
- **Native Git worktree support**: Due to JGit limitations, not using Git's worktree feature
37+
- **Revision garbage collection**: No automatic cleanup of old revisions (users can manually drop)
38+
- **Multi-hub support**: Still tied to a single repository provider per pipeline
39+
40+
## Considered Options
41+
42+
### Option 1: Bare Repository with Git Worktrees
43+
44+
Use Git's worktree feature to create multiple working directories from a single bare repository.
45+
46+
**Implementation**:
47+
- One bare repository at `~/.nextflow/assets/<org>/<project>/.git`
48+
- Multiple worktrees at `~/.nextflow/assets/<org>/<project>/<revision>/`
49+
50+
- Good, because it's the native Git solution for multiple checkouts
51+
- Good, because worktrees are space-efficient
52+
- Good, because Git handles all the complexity
53+
- **Bad, because JGit doesn't support worktrees** (deal-breaker)
54+
- Bad, because requires native Git installation
55+
56+
**Decision**: Rejected due to JGit incompatibility
57+
58+
### Option 2: Bare Repository + Clones per Commit + Revision Map File
59+
60+
Use a bare repository for storage and create clones for each commit, tracking them in a separate file.
61+
62+
**Implementation**:
63+
- Bare repository at `~/.nextflow/assets/<org>/<project>/.nextflow/bare_repo/`
64+
- Clones at `~/.nextflow/assets/<org>/<project>/.nextflow/commits/<commit-sha>/`
65+
- Revision map file at `~/.nextflow/assets/<org>/<project>/.nextflow/revisions.json` mapping revision names to commit SHAs
66+
67+
- Good, because it works with JGit
68+
- Good, because bare repo reduces remote repository interactions to checkout commits
69+
- Good, because explicit revision tracking
70+
- Bad, because disk space as git objects replicated in clones
71+
- Bad, because revision map file can become stale
72+
- Bad, because requires file I/O for every revision lookup
73+
- Bad, because potential race conditions on map file updates
74+
- Bad, because adds complexity of maintaining external state
75+
76+
**Decision**: Initially implemented but later refined
77+
78+
### Option 3: Bare Repository + Shared Clones with Strategy Pattern
79+
80+
Similar to Option 2 but eliminate the separate revision map file by using the bare repository itself as the source of truth. Additionally, use the Strategy pattern to maintain backward compatibility with existing legacy repositories without requiring migration.
81+
82+
**Implementation**:
83+
- Bare repository at `~/.nextflow/assets/.repos/<org>/<project>/bare/`
84+
- Shared clones at `~/.nextflow/assets/.repos/<org>/<project>/clones/<commit-sha>/`
85+
- Use bare repository refs to resolve revisions to commit SHAs dynamically
86+
- JGit alternates mechanism for object sharing
87+
- `AssetManager` as facade with unchanged public API
88+
- `RepositoryStrategy` interface defining repository operations
89+
- `LegacyRepositoryStrategy` for existing direct-clone behavior
90+
- `MultiRevisionRepositoryStrategy` for new bare-repo approach
91+
- Strategy selection based on environment variable or repository state detection
92+
93+
- Good, because no external state file to maintain
94+
- Good, because bare repository is always in sync (fetched on updates)
95+
- Good, because simpler and more reliable
96+
- Good, because atomic updates (Git operations are atomic)
97+
- Good, because works entirely within JGit
98+
- Good, because zero migration needed for existing repositories
99+
- Good, because maintains API compatibility
100+
- Good, because allows gradual adoption
101+
- Good, because isolates legacy code
102+
- Good, because makes future strategies easy to add
103+
- Neutral, because adds abstraction layer
104+
- Bad, because requires resolution on every access (minimal overhead)
105+
- Bad, because increases codebase size initially
106+
107+
**Decision**: Selected
108+
109+
## Solution or decision outcome
110+
111+
Implemented **Option 3 (Bare Repository + Shared Clones with Strategy Pattern)** for multi-revision support with backward compatibility. Multi-revision is the default for new repositories, while legacy mode is available via `NXF_SCM_LEGACY` environment variable.
112+
113+
## Rationale & discussion
114+
115+
### Git Multi-Revision Implementation
116+
117+
The bare repository approach provides efficient multi-revision support:
118+
119+
```
120+
~/.nextflow/assets/.repos/nextflow-io/hello/
121+
├── bare/ # Bare repository (shared objects)
122+
│ ├── objects/ # All Git objects stored here
123+
│ ├── refs/
124+
│ │ ├── heads/
125+
│ │ └── tags/
126+
│ └── config
127+
128+
└── clones/ # Revisions-specific clones
129+
├── abc123.../ # Clone for commit abc123
130+
│ └── .git/
131+
│ ├── objects/ # (uses alternates → bare/objects)
132+
│ └── info/
133+
│ └── alternates # Points to bare/objects
134+
135+
└── def456.../ # Clone for commit def456
136+
└── .git/
137+
138+
~/.nextflow/assets/nextflow-io/hello/
139+
└── .git/ # Legacy repo location (HYBRID state)
140+
```
141+
142+
**Key mechanisms:**
143+
144+
1. **Bare repository as source of truth**: The bare repo is fetched/updated from the remote, keeping refs current
145+
2. **Dynamic resolution**: Revisions (branch/tag names) are resolved to commit SHAs using the bare repo's refs
146+
3. **Object sharing**: Clones use Git alternates to reference the bare repo's objects, avoiding duplication
147+
4. **Atomic operations**: Each clone is independent; downloading a new revision doesn't affect existing ones
148+
5. **Lazy creation**: Clones are created on-demand when a specific revision is requested
149+
150+
**Advantages over revision map file:**
151+
- No external state to maintain or keep in sync
152+
- Bare repo fetch automatically updates all refs
153+
- Resolution is simple: `bareRepo.resolve(revision)` returns commit SHA
154+
- No race conditions on file updates
155+
- Simpler code with fewer failure modes
156+
157+
### Strategy Pattern for Backward Compatibility
158+
159+
The Strategy pattern provides clean separation and backward compatibility:
160+
161+
```
162+
┌─────────────────────────┐
163+
│ AssetManager │ ← Public API (unchanged)
164+
│ (Facade) │
165+
└───────────┬─────────────┘
166+
167+
│ delegates to
168+
169+
┌─────────────────────────┐
170+
│ RepositoryStrategy │ ← Interface
171+
└───────────┬─────────────┘
172+
173+
│ implements
174+
┌───────┴────────┐
175+
│ │
176+
┌───────────┐ ┌─────────────────┐
177+
│ Legacy │ │ MultiRevision │ ← Concrete strategies
178+
│ Strategy │ │ Strategy │
179+
└───────────┘ └─────────────────┘
180+
```
181+
182+
**Strategy selection logic:**
183+
184+
1. Check `NXF_SCM_LEGACY` environment variable → Use legacy if set
185+
2. Check if there is only the legacy asset of the repository (`isOnlyLegacy` method) → Use legacy (preserve existing)
186+
3. Otherwise -> Use multi-revision
187+
188+
189+
**Backward compatibility guarantees:**
190+
191+
- Existing repositories continue to work without changes
192+
- `AssetManager` API remains identical
193+
- CLI commands work with both strategies transparently
194+
- Tests pass with minimal modifications
195+
- No forced migration; users opt-in naturally when creating new repos
196+
197+
### Hybrid State Handling
198+
199+
The system gracefully handles hybrid states where both legacy and multi-revision repositories coexist:
200+
201+
- **Detection**: In hybrid states, a multi-revision strategy is selected by default.
202+
- **Fallback logic**: Multi-revision strategy can fall back to legacy repo for operations if needed
203+
- **No conflicts**: Strategies are designed to coexist; operations target different directories
204+
- **Explicit control**: Users can force a specific strategy via `setStrategyType()` or `NXF_SCM_LEGACY` environment variable
205+
206+
### Migration Path
207+
208+
Users naturally migrate as they pull new revisions:
209+
210+
1. **Existing users**: Can continue with legacy repos (`NXF_SCM_LEGACY` state detected)
211+
2. **New users**: Get multi-revision by default
212+
3. **Opt-in migration**: Delete project directory to switch to multi-revision or pull with --migrate
213+
4. **Opt-out**: Set `NXF_SCM_LEGACY=true` to force legacy mode
214+
215+
### Implementation Details
216+
217+
**Key classes:**
218+
219+
- `RepositoryStrategy`: Interface defining repository operations
220+
- `AbstractRepositoryStrategy`: Base class with shared helper methods
221+
- `LegacyRepositoryStrategy`: Direct clone implementation (original behavior)
222+
- `MultiRevisionRepositoryStrategy`: Bare repo + shared clones implementation
223+
224+
**Critical methods:**
225+
226+
- `download()`: Equivalent for both strategies (legacy pulls, multi-revision creates shared clone)
227+
- `getLocalPath()`: Returns appropriate working directory based on strategy
228+
- `getGit()`: Returns appropriate Git instance (legacy git, bare git, or commit git)
229+
230+
### Performance Characteristics
231+
232+
**Disk usage:**
233+
- Legacy: ~100% per repository (full clone with all git objects) + Worktree
234+
- Multi-revision: ~100% for bare + ~100K (.git with alternates) per revision + Worktree per revision
235+
236+
**Operation speed:**
237+
- First download: Similar (both clone from remote)
238+
- Additional revisions: Multi-revision faster (only fetches new objects once, creates cheap clones)
239+
- Switching revisions: Multi-revision instant (different directories), legacy requires checkout
240+
241+
### Known Limitations
242+
243+
- No automatic migration of legacy repositories
244+
- Bare repository overhead even for users who only need one revision
245+
- JGit alternates slightly more complex than worktrees
246+
- Manual cleanup required for old revision clones
247+
248+
## Links
249+
- [GitHub Issue #2870 - Multiple revisions of the same pipeline for concurrent execution](https://github.com/nextflow-io/nextflow/issues/2870)
250+
- [PR #6620 - Implementation of multiple revisions without revisions map](https://github.com/nextflow-io/nextflow/pull/6620)
251+
- Related PRs implementing the multi-revision approach (linked in #6620)
252+

docs/cli.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,13 @@ $ nextflow run nextflow-io/hello -r v1.1
7979
$ nextflow run nextflow-io/hello -r dev-branch
8080
$ nextflow run nextflow-io/hello -r a3f5c8e
8181
```
82+
:::{versionadded} 25.12.0-edge
83+
:::
84+
Nextflow downloads and stores each explicitly requested Git branch, tag, or commit ID in a separate directory path, enabling you to run multiple revisions of the same pipeline simultaneously. Downloaded revisions are stored in a subdirectory of the local project: `$NXF_ASSETS/.repos/<org>/<repo>/clones/<commitId>`.
85+
86+
:::{tip}
87+
Use tags or commit IDs instead of branches for reproducible pipeline runs. Branch references change as development progresses over time.
88+
:::
8289

8390
(cli-params)=
8491

@@ -171,12 +178,12 @@ Use this to understand a project's structure, see available versions, or verify
171178
$ nextflow info hello
172179
project name: nextflow-io/hello
173180
repository : https://github.com/nextflow-io/hello
174-
local path : $HOME/.nextflow/assets/nextflow-io/hello
181+
local path : $HOME/.nextflow/assets/.repos/nextflow-io/hello
175182
main script : main.nf
176183
revisions :
177-
* master (default)
184+
> master (default)
178185
mybranch
179-
v1.1 [t]
186+
> v1.1 [t]
180187
v1.2 [t]
181188
```
182189

@@ -186,7 +193,7 @@ This shows:
186193
- Where it's cached locally
187194
- Which script runs by default
188195
- Available revisions (branches and tags marked with `[t]`)
189-
- Which revision is currently checked out (marked with `*`)
196+
- Which revisions are currently checked out (marked with `>`)
190197

191198
### Pulling or updating projects
192199

docs/developer/diagrams/nextflow.scm.mmd

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,19 +8,58 @@ classDiagram
88

99
class AssetManager {
1010
project : String
11-
localPath : File
1211
mainScript : String
13-
repositoryProvider : RepositoryProvider
12+
provider : RepositoryProvider
13+
strategy : RepositoryStrategy
1414
hub : String
1515
providerConfigs : List~ProviderConfig~
1616
}
17-
AssetManager --* RepositoryProvider
17+
18+
class RepositoryStrategyType {
19+
<<enumeration>>
20+
LEGACY
21+
MULTI_REVISION
22+
}
23+
24+
AssetManager --> RepositoryStrategyType
25+
AssetManager "1" --o "1" RepositoryStrategy
26+
AssetManager "1" --o "1" RepositoryProvider
1827
AssetManager "1" --* "*" ProviderConfig
1928

29+
class RepositoryStrategy {
30+
<<interface>>
31+
}
32+
class AbstractRepositoryStrategy {
33+
<<abstract>>
34+
project : String
35+
provider : RepositoryProvider
36+
root : File
37+
}
38+
class LegacyRepositoryStrategy {
39+
localPath : File
40+
}
41+
class MultiRevisionRepositoryStrategy {
42+
revision : String
43+
bareRepo : File
44+
commitPath : File
45+
revisionSubdir : File
46+
}
47+
48+
RepositoryStrategy <|-- AbstractRepositoryStrategy
49+
AbstractRepositoryStrategy <|-- LegacyRepositoryStrategy
50+
AbstractRepositoryStrategy <|-- MultiRevisionRepositoryStrategy
51+
52+
class RepositoryProvider {
53+
<<abstract>>
54+
}
55+
56+
RepositoryStrategy --> RepositoryProvider
57+
2058
RepositoryProvider <|-- AzureRepositoryProvider
2159
RepositoryProvider <|-- BitbucketRepositoryProvider
2260
RepositoryProvider <|-- BitbucketServerRepositoryProvider
2361
RepositoryProvider <|-- GiteaRepositoryProvider
2462
RepositoryProvider <|-- GithubRepositoryProvider
2563
RepositoryProvider <|-- GitlabRepositoryProvider
2664
RepositoryProvider <|-- LocalRepositoryProvider
65+

0 commit comments

Comments
 (0)