HuggingFace-like interface for petrographic thin section analysis with Detectron2 and SAHI
Automated instance segmentation and morphological analysis of petrographic thin sections using state-of-the-art computer vision models. Provides a clean, modern workflow for both researchers running inference with pretrained models and developers training custom models.
For running inference with pretrained models:
library(petrographer)
# Load model from public hub
model <- from_pretrained("inclusions")
# Run prediction on an image
results <- predict(model, "my_image.jpg")
# Analyze results
summarize_by_image(results)
get_population_stats(results)For training custom models:
library(petrographer)
# Validate dataset structure
validate_dataset("data/processed/my_dataset")
# Train model (automatically saves to .petrographer/)
train_model(
data_dir = "data/processed/my_dataset",
output_name = "my_model",
num_classes = 5
)
# Load your trained model
model <- load_model("my_model")
results <- predict(model, "test_image.jpg")- Installation
- Model Hub
- Training Models
- Running Predictions
- Core Functions
- Dataset Management
- HPC Training
- Documentation
- Configuration
- Troubleshooting
- Citation
# Install from GitHub
remotes::install_github("flmnh-ai/petrographer")- R 4.1+
- Python 3.8+ with detectron2, sahi, torch, torchvision, opencv-python, scikit-image
- GPU recommended for training (CPU works fine for inference)
Python dependencies are managed automatically via reticulate. The package will guide you through setup on first use.
Models are managed via the pins package with automatic versioning and caching:
Hosted at:
- Models: https://flmnh-ai.s3.us-east-1.amazonaws.com/.petrographer/models/
- Datasets: https://flmnh-ai.s3.us-east-1.amazonaws.com/.petrographer/datasets/
# Download and load pretrained model
model <- from_pretrained("shell_v3", device = "cpu", confidence = 0.5)
# Browse available models
list_models()
# Get model details
model_info("shell_v3")Automatically created at .petrographer/ in your project when training models:
# List your locally trained models
list_trained_models()
# Load a local model (convenience wrapper)
model <- load_model("my_model")
# Or explicitly specify local board
model <- from_pretrained("my_model", board = "local")Advanced users can specify their own boards:
my_board <- pins::board_folder("~/shared-models", versioned = TRUE)
model <- from_pretrained("model_id", board = my_board)train_model(
data_dir = "data/processed/shell_dataset",
output_name = "shell_detector_v4",
num_classes = 5,
max_iter = 2000, # default for fine-tuning
freeze_at = 2, # freeze stem + res2 (default)
backbone = "resnet50", # resnet50, resnet101, resnext101
device = "cuda" # or "cpu", "mps"
)Default parameters optimized for fine-tuning:
max_iter = 2000- Training iterationsims_per_batch = NA- Auto-resolves to 2 images per GPUfreeze_at = 2- Freeze backbone stem + res2 layerslearning_rate = 0.00025- Base LR (auto-scaled by batch size and freeze_at)backbone = "resnet50"- Options: resnet50, resnet101, resnext101, or any Detectron2 model zoo key
The package automatically:
- Validates dataset structure
- Computes optimal batch sizes and learning rates
- Handles version conflicts
- Saves model to
.petrographer/models/with full metadata - Creates training manifests with validation metrics
Organize data in COCO format:
data/processed/my_dataset/
├── train/
│ ├── _annotations.coco.json
│ └── [training images]
└── val/
├── _annotations.coco.json
└── [validation images]
Validate before training:
validate_dataset("data/processed/my_dataset")For images with highly variable sizes, use SAHI slicing:
slice_dataset(
input_dir = "data/raw/my_dataset",
output_dir = "data/processed/my_dataset_sliced",
slice_size = 512,
overlap = 0.2
)# Simple prediction (saves visualization by default)
results <- predict(model, "image.jpg")
# With custom SAHI parameters
results <- predict_image(
image_path = "image.jpg",
model = model,
use_slicing = TRUE,
slice_size = 512,
overlap = 0.2,
save_visualizations = TRUE
)results <- predict_images(
input_dir = "images/",
model = model,
output_dir = "results/"
)# Evaluate training metrics
evaluate_training("Detectron2_Models/my_model")
# Evaluate on COCO dataset
metrics <- evaluate_model_sahi(
model = model,
data_dir = "data/processed/test_dataset"
)Each detected object includes comprehensive morphological properties:
- Basic metrics: Area, perimeter, centroid coordinates
- Shape descriptors: Eccentricity, orientation, circularity, aspect ratio
- Advanced features: Solidity, extent, major/minor axis lengths
# Per-image summary statistics
image_stats <- summarize_by_image(results)
# Population-level statistics
pop_stats <- get_population_stats(results)from_pretrained()- Load model from hub, local board, or custom boardload_model()- Convenience wrapper for locally trained modelslist_models()/list_trained_models()- List available modelsmodel_info()- Show model metadata and validation metricspin_model()- Publish model to board (maintainers only)
validate_dataset()- Check COCO format and show diagnosticsslice_dataset()- SAHI dataset slicing for mixed image sizespin_dataset()/list_datasets()- Dataset versioning and distribution
train_model()- Unified training interface (local or HPC)evaluate_training()- Parse and visualize training metricsprepare_training_config()- Validate training parameters
predict()- S3 method for PetrographyModel objectspredict_image()- Single image inference with SAHI + morphologypredict_images()- Batch processing with parallel supportevaluate_model_sahi()- COCO evaluation metrics
summarize_by_image()- Per-image statisticsget_population_stats()- Population-level metrics
For training on HPC clusters with SLURM (e.g., UF HiPerGator):
Configure HPC defaults in .Renviron:
usethis::edit_r_environ("project")Add these lines:
PETROGRAPHER_HPC_HOST="hpg"
PETROGRAPHER_HPC_BASE_DIR="/blue/yourlab/youruser"
Restart R for changes to take effect.
# Triggers HPC mode automatically when hpc_user is provided
model_dir <- train_model(
data_dir = "data/processed/my_dataset",
output_name = "my_model",
num_classes = 5,
hpc_user = "youruser"
)The package automatically:
- Uploads dataset and training script via rsync
- Submits SLURM job with optimal GPU resources
- Monitors job status with progress updates
- Downloads trained model when complete
- Cleans up remote files (data preserved by default)
# Monitor job status
hpg_status(job)
# Wait for completion with progress
hpg_wait(job)
# Cancel running job
hpg_cancel(job)
# Get job details
hpg_job_info(job)- Website: https://flmnh-ai.github.io/petrographer/
- Vignettes:
- Model Library - Browse and compare trained models
- Training Models - Complete training guide
- Whole Slide Basics - Working with large images
- Example Notebooks: See
inst/notebooks/for complete workflows:model_from_pretrained.qmd- Loading and using pretrained modelspetrography_analysis.qmd- End-to-end analysis workflowtraining_*.qmd- Training examples for different use cases
Optimize for your data:
model <- from_pretrained(
"shell_v3",
confidence = 0.5, # Detection threshold (0.3-0.7 typical)
device = "cuda" # "cpu", "cuda", or "mps"
)
results <- predict_image(
image_path = "image.jpg",
model = model,
slice_size = 512, # Slice dimensions (512 recommended)
overlap = 0.2 # Overlap between slices (0.2 typical)
)Optional configuration:
PETROGRAPHER_HUB_URL- Custom model hub URLPETROGRAPHER_BOARD_PATH- Custom local board locationPETROGRAPHER_HPC_HOST- Default HPC hostnamePETROGRAPHER_HPC_BASE_DIR- Default HPC working directory
- CUDA out of memory: Reduce
ims_per_batch(try 1-2) or use smaller images - Slow training: Check GPU utilization, consider different backbone
- Poor convergence: Increase
max_iteror adjustlearning_rate
- Missing small objects: Lower confidence threshold, use smaller slice sizes
- False positives: Increase confidence threshold, check training data quality
- Poor segmentation: Verify annotation quality, increase training iterations
- Import errors: Check Python environment with
reticulate::py_config() - Environment issues: Restart R session, reinstall Python packages
- Path problems: Use absolute paths with
fs::path_abs()
- Connection timeout: Check SSH config, verify Duo authentication
- Job failures: Check SLURM logs with
hpg_job_info(job) - Transfer errors: Verify paths and permissions on remote system
petrographer/
├── R/ # Package functions
│ ├── pins.R # Model/dataset distribution via pins
│ ├── model.R # Model loading utilities
│ ├── training.R # Training orchestration (local + HPC)
│ ├── prediction.R # Inference + evaluation
│ ├── dataset.R # Dataset utilities
│ ├── morphology.R # Property extraction via scikit-image
│ └── summary.R # Analysis and aggregation
├── inst/
│ ├── python/
│ │ ├── train.py # Detectron2 training script
│ │ └── slice_dataset.py # SAHI dataset slicing utility
│ └── notebooks/ # Example workflows
├── vignettes/ # Package documentation
│ ├── model-library.qmd # Browse trained models
│ ├── training-models.qmd # Training guide
│ └── whole-slide-basics.qmd # Large image workflows
├── tests/ # Unit tests
└── .petrographer/ # Local training board (auto-created)
├── models/ # Trained models with versions
└── datasets/ # Pinned datasets
- Keep
ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512(default) - Use SAHI slicing with
slice_size = 512andoverlap = 0.2 - Consider
TEST.DETECTIONS_PER_IMAGE = 1000for very dense images
- Use
ims_per_batch = 2per GPU for good speed/accuracy balance - ResNet-50 backbone is fastest, ResNeXt-101 for maximum accuracy
- Multi-GPU training automatically scales batch size and learning rate
This is research software under active development. Breaking changes may occur between versions. See CLAUDE.md for development guidelines and philosophy.
If you use this package in your research, please cite:
@software{petrographer,
title = {petrographer: Petrographic Thin Section Analysis with Deep Learning},
author = {Nicolas Gauthier and Ashley Rutkoski},
year = {2025},
url = {https://github.com/flmnh-ai/petrographer},
note = {R package version 0.0.0.9000}
}- Detectron2 - Facebook AI Research's detection framework
- SAHI - Slicing aided hyper inference for small object detection
- reticulate - R-Python integration
- pins - Versioned data publishing and sharing
- hipergator - SLURM HPC integration for R
- Modern R utilities: cli, fs, glue
