|
1 | | -# FlashInfer-Bench |
| 1 | +<div align="center" id="top"> |
2 | 2 |
|
3 | | -**FlashInfer-Bench** is a lightweight, extensible benchmarking suite for evaluating low-level kernel implementations of model inference workloads. It is centered around the `Trace` artifact — a detailed record of a workload execution. It enables systematic comparison of kernel implementations with correctness and performance metrics. |
| 3 | +<img src="web/packages/ui/src/brand/fib_logo.png" alt="logo" width="400" margin="10px"></img> |
4 | 4 |
|
5 | | -## Installation |
| 5 | +[](https://bench.flashinfer.ai/docs/) |
| 6 | +[](https://github.com/flashinfer-ai/flashinfer-bench/blob/main/LICENCE) |
| 7 | +[](https://pypi.org/project/flashinfer-bench/) |
6 | 8 |
|
7 | | -Install FlashInfer-Bench with pip: |
| 9 | +**Building the Virtuous Cycle for AI-driven LLM Systems** |
8 | 10 |
|
9 | | -```bash |
10 | | -pip install flashinfer-bench |
11 | | -``` |
| 11 | +[Get Started](#get-started) | [Documentation](https://bench.flashinfer.ai/docs/) | [Blogpost](https://flashinfer.ai/2025/10/16/flashinfer-bench.html) |
| 12 | +</div> |
12 | 13 |
|
13 | | -Import FlashInfer-Bench: |
| 14 | +**FlashInfer-Bench** is a benchmark suite and production workflow designed to build a virtuous cycle of self-improving AI systems. |
14 | 15 |
|
15 | | -```python |
16 | | -import flashinfer_bench as fib |
17 | | -``` |
| 16 | +It is part of a broader initiative to build the *virtuous cycle of AI improving AI systems* — enabling AI agents and engineers to collaboratively optimize the very kernels that power large language models. |
18 | 17 |
|
19 | | -## Dataset Layout |
| 18 | +## Installation |
20 | 19 |
|
21 | | -Each dataset is organized as follows: |
| 20 | +Install FlashInfer-Bench with pip: |
22 | 21 |
|
| 22 | +```bash |
| 23 | +pip install flashinfer-bench |
23 | 24 | ``` |
24 | | -dataset/ |
25 | | -├── definitions/ # One JSON file per workload definition |
26 | | -├── solutions/ # One JSON file per solution implementation |
27 | | -└── traces/ # Benchmark results |
28 | | -``` |
29 | | - |
30 | | -* Each **Definition** describes a computation task and reference logic. |
31 | | -* Each **Solution** specifies a kernel or agent implementation for a definition. |
32 | | -* Each **Trace** records a benchmark result: input config, performance, correctness, environment, etc. |
33 | 25 |
|
34 | | -You can load the full dataset using: |
| 26 | +Import FlashInfer-Bench: |
35 | 27 |
|
36 | 28 | ```python |
37 | | -from flashinfer_bench import TraceSet |
38 | | -trace_set = TraceSet.from_path("./dataset") |
39 | | -``` |
40 | | - |
41 | | -## Command Line Interface (CLI) |
42 | | - |
43 | | -FlashInfer-Bench provides a CLI for running benchmarks and analyzing results. |
44 | | - |
45 | | -### Usage |
46 | | - |
47 | | -#### Options |
48 | | -- `--local <PATH>`: Specifies one or more local paths to load traces from. Can be used multiple times. |
49 | | -- `--hub`: Load the latest traces from the FlashInfer Hub (not yet implemented). |
50 | | -- `--warmup-runs <N>`: Number of warmup runs for benchmarking (default: 10). |
51 | | -- `--iterations <N>`: Number of benchmark iterations (default: 50). |
52 | | -- `--device <DEVICE>`: Device to run benchmarks on (default: cuda:0). |
53 | | -- `--log-level <LEVEL>`: Logging level (default: INFO). |
54 | | -- `--save-results` / `--no-save-results`: Whether to save results after running (default: save). |
55 | | - |
56 | | -#### Example |
57 | | - |
58 | | -```bash |
59 | | -# Run benchmarks on a dataset |
60 | | -flashinfer-bench run --local ./dataset |
61 | | - |
62 | | -# Print a summary of traces |
63 | | -flashinfer-bench report summary --local ./dataset |
64 | | - |
65 | | -# Find the best solution for each definition |
66 | | -flashinfer-bench report best --local ./dataset |
| 29 | +import flashinfer_bench |
67 | 30 | ``` |
68 | 31 |
|
69 | | -## Benchmarking Kernels |
| 32 | +## Get Started |
70 | 33 |
|
71 | | -You can run local benchmarks using the `Benchmark` runner, which scans your dataset for all available definitions and solutions, executes them, and appends resulting traces to the `TraceSet`. |
| 34 | +This [guide](https://bench.flashinfer.ai/docs/start/quick_start) shows you how to use FlashInfer-Bench python module with the FlashInfer-Trace dataset. |
72 | 35 |
|
73 | | -It also supports single-solution execution via `.run_solution(...)`. |
| 36 | +## FlashInfer Trace Dataset |
74 | 37 |
|
75 | | -```python |
76 | | -from flashinfer_bench import Benchmark, BenchmarkConfig, TraceSet |
77 | | - |
78 | | -traces = TraceSet.from_path("./dataset") |
79 | | -config = BenchmarkConfig(warmup_runs=5, iterations=20) |
80 | | -benchmark = Benchmark(traces, config) |
| 38 | +We provide an official dataset called **FlashInfer-Trace** with kernels and workloads in real-world AI system deployment environments. FlashInfer-Bench can use this dataset to measure and compare the performance of kernels. It follows the [FlashInfer Trace Schema](https://bench.flashinfer.ai/docs/flashinfer_trace/flashinfer_trace). |
81 | 39 |
|
82 | | -benchmark.run_all() |
| 40 | +The official dataset is on HuggingFace: https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace |
83 | 41 |
|
84 | | -# Accessing results |
85 | | -print(traces.summary()) |
86 | | -``` |
| 42 | +## Collaborators |
87 | 43 |
|
88 | | -## Schema |
| 44 | +Our collaborators include: |
89 | 45 |
|
90 | | -Each of the core entities is modeled as a dataclass: |
| 46 | +<div align="center"> |
91 | 47 |
|
92 | | -* **Definition**: Workload specification with axes, inputs, outputs, and a reference implementation. |
93 | | -* **Solution**: A concrete implementation with source files and a launch entry point. |
94 | | -* **Trace**: A benchmark result of a solution on a specific workload input. |
| 48 | +[<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/nvidia.svg" height=50/>](https://github.com/NVIDIA/TensorRT-LLM) |
| 49 | +  |
| 50 | +[<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/gpu_mode.png" height=50/>](https://github.com/gpu-mode) |
| 51 | +  |
| 52 | +[<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/sglang.png" height=50/>](https://github.com/sgl-project/sglang) |
| 53 | +  |
| 54 | +[<img src="https://raw.githubusercontent.com/mlc-ai/XGrammar-web-assets/refs/heads/main/repo/vllm.png" height=50/>](https://github.com/vllm-project/vllm) |
95 | 55 |
|
96 | | -See [`schema/`](./schema/) for full documentation. |
| 56 | +</div> |
0 commit comments