A streamlined solution for running Large Language Models (LLMs) in batch mode on HPC systems powered by Slurm. AI-Flux uses the OpenAI-compatible API format with a JSONL-first architecture for all interactions.
JSONL Input Batch Processing Results
(OpenAI Format) (Ollama + Model) (JSON Output)
│ │ │
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Batch │ │ │ │ Output │
│ Requests │─────────────────▶ │ Model on │─────────────────▶ │ Results │
│ (JSONL) │ │ GPU(s) │ │ (JSON) │
└──────────┘ │ │ └──────────┘
└──────────────┘
AI-Flux processes JSONL files in a standardized OpenAI-compatible batch API format, enabling efficient processing of thousands of prompts on HPC systems with minimal overhead.
- Configuration Guide - How to configure AI-Flux
- Models Guide - Supported models and requirements
- Repository Structure - Codebase organization
-
Create and Activate Conda Environment:
conda create -n aiflux python=3.11 -y conda activate aiflux
-
Install Package:
pip install -e . -
Environment Setup:
cp .env.example .env # Edit .env with your SLURM account and model details
The primary workflow for AI-Flux is submitting JSONL files for batch processing on SLURM:
from aiflux.slurm import SlurmRunner
from aiflux.core.config import Config
# Setup SLURM configuration
config = Config()
slurm_config = config.get_slurm_config()
slurm_config.account = "myaccount"
# Initialize runner
runner = SlurmRunner(config=slurm_config)
# Submit JSONL file directly for processing
job_id = runner.run(
input_path="prompts.jsonl",
output_path="results.json",
model="llama3.2:3b",
batch_size=4
)
print(f"Job submitted with ID: {job_id}")JSONL input format follows the OpenAI Batch API specification:
{"custom_id":"request1","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"Explain quantum computing"}],"temperature":0.7,"max_tokens":500}}
{"custom_id":"request2","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is machine learning?"}],"temperature":0.7,"max_tokens":500}}For advanced options like custom batch sizes, processing settings, or SLURM configuration, see the Configuration Guide.
For advanced model configuration, see the Models Guide.
AI-Flux includes a command-line interface for submitting batch processing jobs:
# Process JSONL file directly (core functionality)
aiflux run --model llama3.2:3b --input data/prompts.jsonl --output results/output.jsonFor detailed command options:
aiflux --helpResults are saved in the user's workspace:
[
{
"input": {
"custom_id": "request1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "llama3.2:3b",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Original prompt text"}
],
"temperature": 0.7,
"max_tokens": 1024
},
"metadata": {
"source_file": "example.txt"
}
},
"output": {
"id": "chat-cmpl-123",
"object": "chat.completion",
"created": 1699123456,
"model": "llama3.2:3b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Generated response text"
},
"finish_reason": "stop"
}
]
},
"metadata": {
"model": "llama3.2:3b",
"timestamp": "2023-11-04T12:34:56.789Z",
"processing_time": 1.23
}
}
]AI-Flux provides utility converters to help prepare JSONL files from various input formats:
# Convert CSV to JSONL
aiflux convert csv --input data/papers.csv --output data/papers.jsonl --template "Summarize: {text}"
# Convert directory to JSONL
aiflux convert dir --input data/documents/ --output data/docs.jsonl --recursiveFor code examples of converters, see the examples directory.
AI-Flux ships with a benchmarking workflow that can source prompts, submit the SLURM job, and collect results/metrics for you.
aiflux benchmark --model llama3.2:3b --name nightly --num-prompts 60 \
--account ACCOUNT_NAME --partition PARTITION_NAME --nodes 1- Prompt sources: omit
--inputto automatically download and cache LiveBench categories (benchmark_data/). Provide--input path/to/prompts.jsonlto reuse an existing JSONL file instead. Use--num-prompts,--temperature, and--max-tokensto control synthetic dataset generation. - Outputs: results default to
results/benchmarks/<name>_results.jsonand a metrics summary (<name>_metrics.txt) containing elapsed SLURM runtime and number of prompts processed. - Batch tuning: adjust
--batch-sizefor throughput. Pass model arguments such as--temperatureand--max-tokensto forward them to the runner. - SLURM overrides: forward scheduler settings with
--account,--partition,--nodes,--gpus-per-node,--time,--mem, and--cpus-per-task. - Job controls: add
--rebuildto force an Apptainer image rebuild or--debugto keep the generated job script for inspection.
For the complete option reference:
aiflux benchmark --helpWe welcome contributions! Please see CONTRIBUTING.md for guidelines.