|
| 1 | +# Git Multi-Revision Asset Management with Strategy Pattern |
| 2 | + |
| 3 | +- Authors: Jorge Ejarque |
| 4 | +- Status: Approved |
| 5 | +- Deciders: Jorge Ejarque, Ben Sherman, Paolo Di Tommaso |
| 6 | +- Date: 2025-12-05 |
| 7 | +- Tags: scm, asset-management, multi-revision |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +Nextflow's asset management system has been refactored to support multiple revisions of the same pipeline concurrently through a bare repository approach with shared object storage, while maintaining backward compatibility with legacy direct-clone repositories using the Strategy design pattern. |
| 12 | + |
| 13 | +## Problem Statement |
| 14 | + |
| 15 | +The original asset management system (`AssetManager`) cloned each pipeline directly to `~/.nextflow/assets/<org>/<project>/.git`, creating several limitations: |
| 16 | + |
| 17 | +1. **No concurrent Git multi-revision support**: Only one revision of a pipeline could be checked out at a time, preventing concurrent execution of different versions |
| 18 | +2. **Update conflicts**: Pulling updates while a pipeline was running could cause conflicts or corruption |
| 19 | +3. **Testing limitations**: Users couldn't easily test different versions of a pipeline side-by-side |
| 20 | + |
| 21 | +The goal was to enable running multiple revisions of the same pipeline concurrently (e.g., production on v1.0, testing on v2.0-dev) while maintaining efficient disk usage through object sharing. |
| 22 | + |
| 23 | +## Goals or Decision Drivers |
| 24 | + |
| 25 | +- **Concurrent multi-revision execution**: Must support running different revisions of the same pipeline simultaneously |
| 26 | +- **Efficient disk usage**: Share Git objects between revisions to minimize storage overhead |
| 27 | +- **Backward compatibility**: Must not break existing pipelines using the legacy direct-clone approach |
| 28 | +- **API stability**: Maintain the existing `AssetManager` API for external consumers (K8s plugin, CLI commands, etc.) |
| 29 | +- **Minimal migration impact**: Existing repositories should continue to work without user intervention |
| 30 | +- **JGit compatibility**: Solution must work within JGit's capabilities to avoid relying on Git client installations |
| 31 | +- **Atomic updates**: Downloading new revisions should not interfere with running pipelines |
| 32 | + |
| 33 | +## Non-goals |
| 34 | + |
| 35 | +- **Migration of existing legacy repositories**: Legacy repos continue to work as-is; no forced migration |
| 36 | +- **Native Git worktree support**: Due to JGit limitations, not using Git's worktree feature |
| 37 | +- **Revision garbage collection**: No automatic cleanup of old revisions (users can manually drop) |
| 38 | +- **Multi-hub support**: Still tied to a single repository provider per pipeline |
| 39 | + |
| 40 | +## Considered Options |
| 41 | + |
| 42 | +### Option 1: Bare Repository with Git Worktrees |
| 43 | + |
| 44 | +Use Git's worktree feature to create multiple working directories from a single bare repository. |
| 45 | + |
| 46 | +**Implementation**: |
| 47 | +- One bare repository at `~/.nextflow/assets/<org>/<project>/.git` |
| 48 | +- Multiple worktrees at `~/.nextflow/assets/<org>/<project>/<revision>/` |
| 49 | + |
| 50 | +- Good, because it's the native Git solution for multiple checkouts |
| 51 | +- Good, because worktrees are space-efficient |
| 52 | +- Good, because Git handles all the complexity |
| 53 | +- **Bad, because JGit doesn't support worktrees** (deal-breaker) |
| 54 | +- Bad, because requires native Git installation |
| 55 | + |
| 56 | +**Decision**: Rejected due to JGit incompatibility |
| 57 | + |
| 58 | +### Option 2: Bare Repository + Clones per Commit + Revision Map File |
| 59 | + |
| 60 | +Use a bare repository for storage and create clones for each commit, tracking them in a separate file. |
| 61 | + |
| 62 | +**Implementation**: |
| 63 | +- Bare repository at `~/.nextflow/assets/<org>/<project>/.nextflow/bare_repo/` |
| 64 | +- Clones at `~/.nextflow/assets/<org>/<project>/.nextflow/commits/<commit-sha>/` |
| 65 | +- Revision map file at `~/.nextflow/assets/<org>/<project>/.nextflow/revisions.json` mapping revision names to commit SHAs |
| 66 | + |
| 67 | +- Good, because it works with JGit |
| 68 | +- Good, because bare repo reduces remote repository interactions to checkout commits |
| 69 | +- Good, because explicit revision tracking |
| 70 | +- Bad, because disk space as git objects replicated in clones |
| 71 | +- Bad, because revision map file can become stale |
| 72 | +- Bad, because requires file I/O for every revision lookup |
| 73 | +- Bad, because potential race conditions on map file updates |
| 74 | +- Bad, because adds complexity of maintaining external state |
| 75 | + |
| 76 | +**Decision**: Initially implemented but later refined |
| 77 | + |
| 78 | +### Option 3: Bare Repository + Shared Clones with Strategy Pattern |
| 79 | + |
| 80 | +Similar to Option 2 but eliminate the separate revision map file by using the bare repository itself as the source of truth. Additionally, use the Strategy pattern to maintain backward compatibility with existing legacy repositories without requiring migration. |
| 81 | + |
| 82 | +**Implementation**: |
| 83 | +- Bare repository at `~/.nextflow/assets/.repos/<org>/<project>/bare/` |
| 84 | +- Shared clones at `~/.nextflow/assets/.repos/<org>/<project>/clones/<commit-sha>/` |
| 85 | +- Use bare repository refs to resolve revisions to commit SHAs dynamically |
| 86 | +- JGit alternates mechanism for object sharing |
| 87 | +- `AssetManager` as facade with unchanged public API |
| 88 | +- `RepositoryStrategy` interface defining repository operations |
| 89 | +- `LegacyRepositoryStrategy` for existing direct-clone behavior |
| 90 | +- `MultiRevisionRepositoryStrategy` for new bare-repo approach |
| 91 | +- Strategy selection based on environment variable or repository state detection |
| 92 | + |
| 93 | +- Good, because no external state file to maintain |
| 94 | +- Good, because bare repository is always in sync (fetched on updates) |
| 95 | +- Good, because simpler and more reliable |
| 96 | +- Good, because atomic updates (Git operations are atomic) |
| 97 | +- Good, because works entirely within JGit |
| 98 | +- Good, because zero migration needed for existing repositories |
| 99 | +- Good, because maintains API compatibility |
| 100 | +- Good, because allows gradual adoption |
| 101 | +- Good, because isolates legacy code |
| 102 | +- Good, because makes future strategies easy to add |
| 103 | +- Neutral, because adds abstraction layer |
| 104 | +- Bad, because requires resolution on every access (minimal overhead) |
| 105 | +- Bad, because increases codebase size initially |
| 106 | + |
| 107 | +**Decision**: Selected |
| 108 | + |
| 109 | +## Solution or decision outcome |
| 110 | + |
| 111 | +Implemented **Option 3 (Bare Repository + Shared Clones with Strategy Pattern)** for multi-revision support with backward compatibility. Multi-revision is the default for new repositories, while legacy mode is available via `NXF_SCM_LEGACY` environment variable. |
| 112 | + |
| 113 | +## Rationale & discussion |
| 114 | + |
| 115 | +### Git Multi-Revision Implementation |
| 116 | + |
| 117 | +The bare repository approach provides efficient multi-revision support: |
| 118 | + |
| 119 | +``` |
| 120 | +~/.nextflow/assets/.repos/nextflow-io/hello/ |
| 121 | +├── bare/ # Bare repository (shared objects) |
| 122 | +│ ├── objects/ # All Git objects stored here |
| 123 | +│ ├── refs/ |
| 124 | +│ │ ├── heads/ |
| 125 | +│ │ └── tags/ |
| 126 | +│ └── config |
| 127 | +│ |
| 128 | +└── clones/ # Revisions-specific clones |
| 129 | + ├── abc123.../ # Clone for commit abc123 |
| 130 | + │ └── .git/ |
| 131 | + │ ├── objects/ # (uses alternates → bare/objects) |
| 132 | + │ └── info/ |
| 133 | + │ └── alternates # Points to bare/objects |
| 134 | + │ |
| 135 | + └── def456.../ # Clone for commit def456 |
| 136 | + └── .git/ |
| 137 | +
|
| 138 | +~/.nextflow/assets/nextflow-io/hello/ |
| 139 | +└── .git/ # Legacy repo location (HYBRID state) |
| 140 | +``` |
| 141 | + |
| 142 | +**Key mechanisms:** |
| 143 | + |
| 144 | +1. **Bare repository as source of truth**: The bare repo is fetched/updated from the remote, keeping refs current |
| 145 | +2. **Dynamic resolution**: Revisions (branch/tag names) are resolved to commit SHAs using the bare repo's refs |
| 146 | +3. **Object sharing**: Clones use Git alternates to reference the bare repo's objects, avoiding duplication |
| 147 | +4. **Atomic operations**: Each clone is independent; downloading a new revision doesn't affect existing ones |
| 148 | +5. **Lazy creation**: Clones are created on-demand when a specific revision is requested |
| 149 | + |
| 150 | +**Advantages over revision map file:** |
| 151 | +- No external state to maintain or keep in sync |
| 152 | +- Bare repo fetch automatically updates all refs |
| 153 | +- Resolution is simple: `bareRepo.resolve(revision)` returns commit SHA |
| 154 | +- No race conditions on file updates |
| 155 | +- Simpler code with fewer failure modes |
| 156 | + |
| 157 | +### Strategy Pattern for Backward Compatibility |
| 158 | + |
| 159 | +The Strategy pattern provides clean separation and backward compatibility: |
| 160 | + |
| 161 | +``` |
| 162 | +┌─────────────────────────┐ |
| 163 | +│ AssetManager │ ← Public API (unchanged) |
| 164 | +│ (Facade) │ |
| 165 | +└───────────┬─────────────┘ |
| 166 | + │ |
| 167 | + │ delegates to |
| 168 | + ▼ |
| 169 | +┌─────────────────────────┐ |
| 170 | +│ RepositoryStrategy │ ← Interface |
| 171 | +└───────────┬─────────────┘ |
| 172 | + △ |
| 173 | + │ implements |
| 174 | + ┌───────┴────────┐ |
| 175 | + │ │ |
| 176 | +┌───────────┐ ┌─────────────────┐ |
| 177 | +│ Legacy │ │ MultiRevision │ ← Concrete strategies |
| 178 | +│ Strategy │ │ Strategy │ |
| 179 | +└───────────┘ └─────────────────┘ |
| 180 | +``` |
| 181 | + |
| 182 | +**Strategy selection logic:** |
| 183 | + |
| 184 | +1. Check `NXF_SCM_LEGACY` environment variable → Use legacy if set |
| 185 | +2. Check if there is only the legacy asset of the repository (`isOnlyLegacy` method) → Use legacy (preserve existing) |
| 186 | +3. Otherwise -> Use multi-revision |
| 187 | + |
| 188 | + |
| 189 | +**Backward compatibility guarantees:** |
| 190 | + |
| 191 | +- Existing repositories continue to work without changes |
| 192 | +- `AssetManager` API remains identical |
| 193 | +- CLI commands work with both strategies transparently |
| 194 | +- Tests pass with minimal modifications |
| 195 | +- No forced migration; users opt-in naturally when creating new repos |
| 196 | + |
| 197 | +### Hybrid State Handling |
| 198 | + |
| 199 | +The system gracefully handles hybrid states where both legacy and multi-revision repositories coexist: |
| 200 | + |
| 201 | +- **Detection**: In hybrid states, a multi-revision strategy is selected by default. |
| 202 | +- **Fallback logic**: Multi-revision strategy can fall back to legacy repo for operations if needed |
| 203 | +- **No conflicts**: Strategies are designed to coexist; operations target different directories |
| 204 | +- **Explicit control**: Users can force a specific strategy via `setStrategyType()` or `NXF_SCM_LEGACY` environment variable |
| 205 | + |
| 206 | +### Migration Path |
| 207 | + |
| 208 | +Users naturally migrate as they pull new revisions: |
| 209 | + |
| 210 | +1. **Existing users**: Can continue with legacy repos (`NXF_SCM_LEGACY` state detected) |
| 211 | +2. **New users**: Get multi-revision by default |
| 212 | +3. **Opt-in migration**: Delete project directory to switch to multi-revision or pull with --migrate |
| 213 | +4. **Opt-out**: Set `NXF_SCM_LEGACY=true` to force legacy mode |
| 214 | + |
| 215 | +### Implementation Details |
| 216 | + |
| 217 | +**Key classes:** |
| 218 | + |
| 219 | +- `RepositoryStrategy`: Interface defining repository operations |
| 220 | +- `AbstractRepositoryStrategy`: Base class with shared helper methods |
| 221 | +- `LegacyRepositoryStrategy`: Direct clone implementation (original behavior) |
| 222 | +- `MultiRevisionRepositoryStrategy`: Bare repo + shared clones implementation |
| 223 | + |
| 224 | +**Critical methods:** |
| 225 | + |
| 226 | +- `download()`: Equivalent for both strategies (legacy pulls, multi-revision creates shared clone) |
| 227 | +- `getLocalPath()`: Returns appropriate working directory based on strategy |
| 228 | +- `getGit()`: Returns appropriate Git instance (legacy git, bare git, or commit git) |
| 229 | + |
| 230 | +### Performance Characteristics |
| 231 | + |
| 232 | +**Disk usage:** |
| 233 | +- Legacy: ~100% per repository (full clone with all git objects) + Worktree |
| 234 | +- Multi-revision: ~100% for bare + ~100K (.git with alternates) per revision + Worktree per revision |
| 235 | + |
| 236 | +**Operation speed:** |
| 237 | +- First download: Similar (both clone from remote) |
| 238 | +- Additional revisions: Multi-revision faster (only fetches new objects once, creates cheap clones) |
| 239 | +- Switching revisions: Multi-revision instant (different directories), legacy requires checkout |
| 240 | + |
| 241 | +### Known Limitations |
| 242 | + |
| 243 | +- No automatic migration of legacy repositories |
| 244 | +- Bare repository overhead even for users who only need one revision |
| 245 | +- JGit alternates slightly more complex than worktrees |
| 246 | +- Manual cleanup required for old revision clones |
| 247 | + |
| 248 | +## Links |
| 249 | +- [GitHub Issue #2870 - Multiple revisions of the same pipeline for concurrent execution](https://github.com/nextflow-io/nextflow/issues/2870) |
| 250 | +- [PR #6620 - Implementation of multiple revisions without revisions map](https://github.com/nextflow-io/nextflow/pull/6620) |
| 251 | +- Related PRs implementing the multi-revision approach (linked in #6620) |
| 252 | + |
0 commit comments