A comprehensive machine learning project template with modern tooling for dependency management, containerization, CI/CD, experiment tracking, and HPC job scheduling.
- Environment Management:
uvfor fast Python dependency management withpyproject.toml - Containerization: Apptainer (Singularity) container support for reproducible environments
- CI/CD: GitHub Actions for automated code formatting checks (ruff) and unit testing (pytest)
- Code Quality: Pre-commit hooks with ruff for automatic code formatting and linting
- Experiment Tracking: Weights & Biases (W&B) integration for experiment management
- HPC Support: Slurm job script templates for cluster computing
- AI Assistant Config: Configuration files for Claude and other AI coding assistants
.
├── src/ # Main source code
├── tests/ # Unit tests
├── scripts/ # Utility scripts
├── slurm/ # Slurm job scripts
├── config/ # Configuration files (W&B, model configs)
├── data/ # Data directory (gitignored)
├── checkpoints/ # Model checkpoints (gitignored)
├── logs/ # Training logs (gitignored)
├── pyproject.toml # Project configuration and dependencies
├── uv.lock # Locked dependencies (generate with `uv lock`)
├── Apptainer.def # Container definition file
├── .pre-commit-config.yaml # Pre-commit hooks configuration
├── .github/workflows/ # GitHub Actions workflows
├── Makefile # Convenient command shortcuts
├── CLAUDE.md # Claude AI assistant configuration
└── AGENT.md # General AI assistant configuration
- Python 3.11 or higher
- uv package manager
- (Optional) Apptainer/Singularity for containerization
- (Optional) Access to Slurm cluster
- (Optional) Weights & Biases account
-
Clone and navigate to the project:
git clone https://github.com/TRAIS-Lab/project-template.git cd project-template -
Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Install dependencies:
uv sync --all-extras --dev
Note:
--all-extrasis needed because dev dependencies are defined as optional dependencies inpyproject.toml. -
Generate lock file (if not present):
uv lock
-
Activate the virtual environment:
source .venv/bin/activate # On Unix/macOS # or .venv\Scripts\activate # On Windows
-
Install pre-commit hooks (optional but recommended):
uv run pre-commit install
Or use the Makefile:
make setup # Installs dependencies and pre-commit hooks
The project includes a Makefile with convenient shortcuts for common tasks:
# Setup (install dependencies and pre-commit hooks)
make setup
# Code formatting and linting
make format # Format code
make lint # Lint code
make format-check # Check formatting without modifying
make lint-fix # Fix auto-fixable linting issues
# Testing
make test # Run tests
make test-cov # Run tests with coverage report
# Pre-commit
make pre-commit-install # Install pre-commit hooks
make pre-commit-run # Run pre-commit on all files
# Container
make build-container # Build Apptainer container
make run-container CMD="python -m src.train" # Run command in container
# Cleanup
make clean # Remove cache and build artifacts
# See all available commands
make help# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=src --cov-report=html
# Run specific test file
uv run pytest tests/test_example.py# Format code
uv run ruff format .
# Check formatting
uv run ruff format --check .
# Lint code
uv run ruff check .
# Fix auto-fixable issues
uv run ruff check --fix .- Add the dependency to
pyproject.tomlunder[project.dependencies](for production) or[project.optional-dependencies.dev](for development) - Run
uv lockto update the lock file - Run
uv sync --all-extras --devto install the new dependency (oruv syncfor production dependencies only)
Example:
# Add a new dependency
uv add numpy # Adds to [project.dependencies]
uv add --dev pytest-cov # Adds to [project.optional-dependencies.dev]# --fakeroot: Build without root privileges (requires fakeroot to be configured)
apptainer build --fakeroot image.sif Apptainer.defOr use the Makefile:
make build-container# Run a command with advanced options
# --nv: Enable NVIDIA GPU support (mounts GPU drivers and libraries)
# --cleanenv: Clean environment variables, only keep minimal set
# --bind "$PWD":/work: Bind mount current directory to /work in container
# --pwd /work: Set working directory to /work inside container
# bash -lc: Run bash as login shell to source profile scripts
# set -euo pipefail: Bash safety options (-e: exit on error, -u: error on unset vars, -o pipefail: fail pipeline on any error)
# Multiple bash commands can be separated by semicolons or newlines
apptainer exec --nv --cleanenv --bind "$PWD":/work --pwd /work image.sif bash -lc '
set -euo pipefail
uv run ruff format .
uv run pytest
'
# Interactive shell (opens an interactive shell inside the container)
apptainer shell image.sifOr use the Makefile:
make run-container CMD="python -c 'print(1)'"-
Login to W&B:
wandb login
-
Configure W&B:
- Edit
config/wandb_config.yamlwith your project name and entity - Set
WANDB_PROJECTandWANDB_ENTITYenvironment variables
- Edit
-
Use in your code:
import wandb wandb.init( project="your-project", entity="your-entity", config=hyperparameters )
-
Edit the Slurm script (
slurm/train.sh):- Adjust resource requirements (time, memory, GPUs)
- Set your W&B project and entity
- Configure your training command
-
Submit the job:
sbatch slurm/train.sh
-
Monitor the job:
squeue -u $USER -
View logs:
tail -f logs/train_<job_id>.out
The repository includes two GitHub Actions workflows:
-
Format Check (
.github/workflows/format-check.yml):- Runs on pull requests and pushes to main/develop
- Checks code formatting with ruff format
- Runs ruff linting
-
Unit Tests (
.github/workflows/test.yml):- Runs on pull requests and pushes to main/develop
- Tests against Python 3.11 and 3.12
- Generates coverage reports
Pre-commit hooks automatically run before each commit to ensure code quality:
- Trailing whitespace removal
- End-of-file fixes
- YAML/JSON/TOML validation
- Ruff formatting and linting
To bypass pre-commit hooks (not recommended):
git commit --no-verify -m "message"This template includes configuration files for AI coding assistants:
- CLAUDE.md: Detailed project context and guidelines for Claude
- AGENT.md: General configuration for AI assistants
These files help AI assistants understand the project structure, coding standards, and best practices.
Reproducibility is crucial for scientific research. Follow these practices to ensure your experiments can be replicated:
Always set random seeds for reproducibility:
import random
import numpy as np
import torch # if using PyTorch
# Set seeds
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(SEED)Best Practice: Store the seed value in your configuration files and log it with W&B.
-
Lock dependencies: Always commit
uv.lockto version control# After modifying pyproject.toml, update and commit lock file uv lock git add uv.lock pyproject.toml git commit -m "Update dependencies and lock file"
-
Use containers: Build Apptainer containers for experiments to ensure consistent environments
# Build container from definition file apptainer build --fakeroot image.sif Apptainer.def # Run experiments in container apptainer exec --nv --cleanenv --bind "$PWD":/work --pwd /work image.sif bash -lc ' set -euo pipefail python -m src.train ' # Or use Makefile make build-container make run-container CMD="python -m src.train"
-
Document Python version: Specify exact Python version in
Apptainer.defandpyproject.toml# Check current Python version python --version # In pyproject.toml, specify: # requires-python = ">=3.11,<3.12" # For exact version # requires-python = ">=3.11" # For minimum version # In Apptainer.def, use specific base image: # From: python:3.11.5-slim # Specific version # From: python:3.11-slim # Minor version
-
Freeze system packages: Document any system-level dependencies or CUDA versions
# Check CUDA version (if using GPUs) nvidia-smi # Check system packages pip list # Python packages apt list --installed # Debian/Ubuntu (if in container) # Document in Apptainer.def or create requirements-system.txt # Example in Apptainer.def: # %post # apt-get update # apt-get install -y cuda-toolkit-11-8 # Document specific version
- Log everything: Hyperparameters, model architecture, data splits, preprocessing steps
- Use W&B tags: Tag experiments by purpose (e.g.,
baseline,ablation,final) - Version datasets: Document dataset versions, splits, and preprocessing
- Save configurations: Commit configuration files to git for each experiment
import wandb
wandb.init(
project="your-project",
config={
"learning_rate": 0.001,
"batch_size": 32,
"seed": 42,
"model": "transformer",
"dataset_version": "v1.0",
},
tags=["baseline", "final"]
)- Document data sources: Include data acquisition date, source, and preprocessing steps
- Version control splits: Save train/val/test split indices or use deterministic splitting
- Data checksums: Compute and store checksums for datasets to verify integrity
- Data documentation: Create
data/README.mddescribing datasets and their structure
- Centralize configs: Store all hyperparameters in
config/directory - Version configs: Commit configuration files to track changes
- Use YAML/JSON: Human-readable formats for easier review and modification
- Environment variables: Use
.envfiles for sensitive information (never commit secrets)
- Save checkpoints regularly: Enable automatic checkpointing during training
- Metadata with checkpoints: Save configuration, seed, and git commit hash with each checkpoint
- Naming conventions: Use clear naming (e.g.,
model_epoch_10_seed_42_commit_abc123.pt) - Document best models: Track which checkpoints correspond to reported results
- Git tags for experiments: Tag git commits for each experiment or paper submission
git tag -a v1.0-baseline -m "Baseline experiment results" git tag -a paper-v1.0 -m "Code version for paper submission"
- Commit before experiments: Always commit code before running experiments
- Document branches: Use descriptive branch names for experimental variants
- Link results to commits: Reference git commit hashes in papers and reports
- Report statistics: Document mean and standard deviation across multiple runs
- Save predictions: Store model predictions for later analysis
- Logging consistency: Use consistent logging formats across experiments
- Results README: Create
results/README.mdlinking experiments to configurations
Before publishing or sharing results, verify:
- Random seeds are set and documented
- All dependencies are pinned in
uv.lock - Configuration files are committed
- W&B runs are logged and tagged appropriately
- Data preprocessing steps are documented
- Code is committed with descriptive messages
- Git tags mark important experiment versions
- Checkpoints include metadata (seed, config, commit hash)
- Container can reproduce the environment
- README documents how to run experiments
# src/train.py
import wandb
import git
from datetime import datetime
# Get git commit hash for reproducibility
repo = git.Repo(search_parent_directories=True)
commit_hash = repo.head.object.hexsha[:7]
# Initialize W&B with comprehensive logging
wandb.init(
project="your-project",
config={
**hyperparameters, # Your config dict
"git_commit": commit_hash,
"timestamp": datetime.now().isoformat(),
},
tags=["reproducible", "baseline"],
)
# Set seeds (from config)
set_all_seeds(config["seed"])
# Train model...- Create a feature branch from
main - Make your changes
- Ensure code passes ruff checks:
uv run ruff format . && uv run ruff check . - Write/update tests and ensure they pass:
uv run pytest - Commit (pre-commit hooks will run automatically)
- Push and create a pull request
MIT License.