mlcommons · fortaa · Nov 24, 2025
@@ -6,16 +6,16 @@ MLPerf® Storage is a benchmark suite to characterize the performance of storage
 - [Installation](#installation)
 - [Configuration](#configuration)
 - [Workloads](#workloads)
-	- [U-Net3D](#u-net3d)
-   	- [ResNet-50](#resnet-50)
-   	- [CosmoFlow](#cosmoflow)
+        - [U-Net3D](#u-net3d)
+        - [ResNet-50](#resnet-50)
+        - [CosmoFlow](#cosmoflow)
 - [Parameters](#parameters)
-	- [CLOSED](#closed)
-	- [OPEN](#open)
+        - [CLOSED](#closed)
+        - [OPEN](#open)
 - [Submission Rules](#submission-rules)
-- 
+-
 ## Overview
-For an overview of how this benchmark suite is used by submitters to compare the performance of storage systems supporting an AI cluster, see the MLPerf® Storage Benchmark submission rules here: [doc](https://github.com/mlcommons/storage/blob/main/Submission_guidelines.md). 
+For an overview of how this benchmark suite is used by submitters to compare the performance of storage systems supporting an AI cluster, see the MLPerf® Storage Benchmark submission rules here: [doc](https://github.com/mlcommons/storage/blob/main/Submission_guidelines.md).
 
 ## Prerequisite
 
@@ -26,38 +26,32 @@ Following prerequisites must be satisfied
 1. Pick one host to act as the launcher client host. Passwordless ssh must be setup from the launcher client host to all other participating client hosts.  `ssh-copy-id` is a useful tool.
 2. The code and data location(discussed in further sections) must be exactly same in every client host including the launcher host. This is because, the same benchmark command is automatically triggered in every participating client host during the distributed training process.
 
-## Installation 
+## Installation
 **The following installation steps must be run on every client host that will participate in running the benchmarks.**
 
 ### Dependencies
-DLIO requires MPI package. 
-For eg: when running on Ubuntu 24.04, install openmpi tools and libraries. 
+DLIO requires MPI package.
+For eg: when running on Ubuntu 24.04, install openmpi tools and libraries.
 
 ```bash
-sudo apt install python3-pip python3-venv libopenmpi-dev openmpi-common
+sudo apt install pipx libopenmpi-dev openmpi-common
 ```
 
-Create a virtual environment for package installations and activate it.
+Install PDM and make it visible to a shell.
 
 ```bash
-python3 -m venv ~/.venvs/myenv
-source ~/.venvs/myenv/bin/activate
-```
-
-### Pip
-Please ensure you have the latest version of pip installed. This will fix the following error where the package is built as "UNKNOWN". Upgrade pip like so:
-
-```bash
-python3 -m pip install --upgrade pip
+pipx install pdm
+pipx ensurepath
 ```
 
+### Prepare test environment
 
 Clone the latest release from [MLCommons Storage](https://github.com/mlcommons/storage) repository and install Python dependencies.
 
 ```bash
 git clone -b v2.0 https://github.com/mlcommons/storage.git
 cd storage
-pip3 install -e .
+pdm install --frozen-lockfile
 ```
 
 The working directory structure is as follows
@@ -72,9 +66,22 @@ The working directory structure is as follows
                    |---(folder contains configs for all checkpoint and training workloads)
            |---vectordbbench (These configurations are PREVIEW only and not available for submission)
                |---(folder contains configs for all vectordb workloads)
+       |---.venv (the default directory where the virtual environment managed by pdm is located)
+```
+
+The benchmark simulation will be performed through the [dlio_benchmark](https://github.com/argonne-lcf/dlio_benchmark) code, a benchmark suite for emulating I/O patterns for deep learning workloads. [dlio_benchmark](https://github.com/argonne-lcf/dlio_benchmark) is listed as a prerequisite to a specific git branch. A future release will update the installer to pull DLIO from PyPi. The DLIO configuration of each workload is specified through a yaml file. You can see the configs of all MLPerf Storage workloads in the `configs` folder.
+
+You can invoke the mlpstorage script either through pdm
+```bash
+$ pdm run mlpstorage -h
+```
+
+or directly once the Python virtual environment is activated:
+```bash
+source .venv/bin/activate
+(mlperf-storage-3.12) [...]$ mlpstorage -h
 ```
 
-The benchmark simulation will be performed through the [dlio_benchmark](https://github.com/argonne-lcf/dlio_benchmark) code, a benchmark suite for emulating I/O patterns for deep learning workloads. [dlio_benchmark](https://github.com/argonne-lcf/dlio_benchmark) is listed as a prerequisite to a specific git branch. A future release will update the installer to pull DLIO from PyPi. The DLIO configuration of each workload is specified through a yaml file. You can see the configs of all MLPerf Storage workloads in the `configs` folder. 
 
 ## Operation
 The benchmarks uses nested commands to select the workload category, workload, and workload parameters.
@@ -378,10 +385,10 @@ View Only:
 
 Example:
 
-For running benchmark on `unet3d` workload with data located in `unet3d_data` directory using 2 h100 accelerators spread across 2 client hosts(with IPs 10.117.61.121,10.117.61.165) and results on `unet3d_results` directory, 
+For running benchmark on `unet3d` workload with data located in `unet3d_data` directory using 2 h100 accelerators spread across 2 client hosts(with IPs 10.117.61.121,10.117.61.165) and results on `unet3d_results` directory,
 
 ```bash
-mlpstorage training run --hosts 10.117.61.121,10.117.61.165 --num-client-hosts 2 --client-host-memory-in-gb 64 --num-accelerators 2 --accelerator-type h100 --model unet3d  --data-dir unet3d_data --results-dir unet3d_results    --param dataset.num_files_train=400 
+mlpstorage training run --hosts 10.117.61.121,10.117.61.165 --num-client-hosts 2 --client-host-memory-in-gb 64 --num-accelerators 2 --accelerator-type h100 --model unet3d  --data-dir unet3d_data --results-dir unet3d_results    --param dataset.num_files_train=400
 ```
 
 4. Benchmark submission report is generated by aggregating the individual run results. The reporting command provides the associated functions to generate a report for a given results directory
@@ -449,11 +456,11 @@ View Only:
   --what-if             View the configuration that would execute and the associated command.
 ```
 
-Note: The `reportgen` script must be run in the launcher client host. 
+Note: The `reportgen` script must be run in the launcher client host.
 
 ## Training Models
 Currently, the storage benchmark suite supports benchmarking of 3 deep learning workloads
-- Image segmentation using U-Net3D model 
+- Image segmentation using U-Net3D model
 - Image classification using Resnet-50 model
 - Cosmology parameter prediction using CosmoFlow model
 
@@ -470,16 +477,16 @@ Generate data for the benchmark run based on the minimum files
 ```bash
 mlpstorage training datagen --hosts 127.0.0.1 --num-processes 8 --model unet3d --data-dir unet3d_data --results-dir unet3d_results  --param dataset.num_files_train=42000
 ```
-  
+
 Run the benchmark.
 
 ```bash
 mlpstorage training run --hosts 127.0.0.1 --num-client-hosts 1 --client-host-memory-in-gb 64 --num-accelerators 4 --accelerator-type h100 --model unet3d  --data-dir unet3d_data --results-dir unet3d_results --param dataset.num_files_train=42000
 ```
 
-All results will be stored in the directory configured using `--results-dir`(or `-r`) argument. To generate the final report, run the following in the launcher client host. 
+All results will be stored in the directory configured using `--results-dir`(or `-r`) argument. To generate the final report, run the following in the launcher client host.
 
-```bash 
+```bash
 mlpstorage reports reportgen --results-dir unet3d_results
 ```
 
@@ -496,16 +503,16 @@ Generate data for the benchmark run
 ```bash
 mlpstorage training datagen --hosts 127.0.0.1 --num-processes 8 --model resnet50 --data-dir resnet50_data --results-dir resnet50_results  --param dataset.num_files_train=2557
 ```
-  
+
 Run the benchmark.
 
 ```bash
 mlpstorage training run --hosts 127.0.0.1 --num-client-hosts 1  --client-host-memory-in-gb 64  --num-accelerators 16 --accelerator-type h100  --model resnet50  --data-dir resnet50_data --results-dir resnet50_results --param dataset.num_files_train=2557
 ```
 
-All results will be stored in the directory configured using `--results-dir`(or `-r`) argument. To generate the final report, run the following in the launcher client host. 
+All results will be stored in the directory configured using `--results-dir`(or `-r`) argument. To generate the final report, run the following in the launcher client host.
 
-```bash 
+```bash
 mlpstorage reports reportgen --results-dir resnet50_results
 ```
 
@@ -514,49 +521,49 @@ mlpstorage reports reportgen --results-dir resnet50_results
 Calculate minimum dataset size required for the benchmark run based on your client configuration
 
 ```bash
-mlpstorage training datasize --model cosmoflow --client-host-memory-in-gb 64 --num-client-hosts 1 --max-accelerators 16 --accelerator-type h100 
+mlpstorage training datasize --model cosmoflow --client-host-memory-in-gb 64 --num-client-hosts 1 --max-accelerators 16 --accelerator-type h100
 ```
 
 Generate data for the benchmark run
 
 ```bash
 mlpstorage training datagen --hosts 127.0.0.1 --num-processes 8 --model cosmoflow --data-dir cosmoflow_data --results-dir=cosmoflow_results  --param dataset.num_files_train=121477
 ```
-  
+
 Run the benchmark.
 
 ```bash
-mlpstorage training run  --hosts 127.0.0.1 --num-client-hosts 1  --client-host-memory-in-gb 64 --num-accelerators 16  --accelerator-type h100  --model cosmoflow --data-dir cosmoflow_data --results-dir cosmoflow_results --param dataset.num_files_train=121477 
+mlpstorage training run  --hosts 127.0.0.1 --num-client-hosts 1  --client-host-memory-in-gb 64 --num-accelerators 16  --accelerator-type h100  --model cosmoflow --data-dir cosmoflow_data --results-dir cosmoflow_results --param dataset.num_files_train=121477
 ```
 
-All results will be stored in the directory configured using `--results-dir`(or `-r`) argument. To generate the final report, run the following in the launcher client host. 
+All results will be stored in the directory configured using `--results-dir`(or `-r`) argument. To generate the final report, run the following in the launcher client host.
 
-```bash 
+```bash
 mlpstorage reports reportgen --results-dir cosmoflow_results
 ```
 
-## Parameters 
+## Parameters
 
 ### CLOSED
 Below table displays the list of configurable parameters for the benchmark in the closed category.
 
 | Parameter                      | Description                                                 |Default|
 | ------------------------------ | ------------------------------------------------------------ |-------|
 | **Dataset params**		|								|   |
-| dataset.num_files_train       | Number of files for the training set  		        | --|
-| dataset.num_subfolders_train  | Number of subfolders that the training set is stored	        |0|
+| dataset.num_files_train       | Number of files for the training set                          | --|
+| dataset.num_subfolders_train  | Number of subfolders that the training set is stored          |0|
 | dataset.data_folder           | The path where dataset is stored				| --|
 | **Reader params**				|						|   |
 | reader.read_threads		| Number of threads to load the data                            | --|
 | reader.computation_threads    | Number of threads to preprocess the data(for TensorFlow)      |1|
 | reader.prefetch_size    | Number of batches to prefetch      |2|
-| reader.transfer_size       | Number of bytes in the read buffer(only for Tensorflow)  		        | |
-| reader.odirect                  | Whether to use direct I/O for reader (currectly applicable to UNet3D)   | False | 
+| reader.transfer_size       | Number of bytes in the read buffer(only for Tensorflow)                          | |
+| reader.odirect                  | Whether to use direct I/O for reader (currectly applicable to UNet3D)   | False |
 | **Checkpoint params**		|								|   |
-| checkpoint.checkpoint_folder	| The folder to save the checkpoints  				| --|
+| checkpoint.checkpoint_folder	| The folder to save the checkpoints                            | --|
 | **Storage params**		|								|   |
-| storage.storage_root		| The storage root directory  					| ./|
-| storage.storage_type		| The storage type  						|local_fs|
+| storage.storage_root		| The storage root directory                                    | ./|
+| storage.storage_type		| The storage type                                              |local_fs|
 
 
 ### OPEN
@@ -566,10 +573,10 @@ In addition to what can be changed in the CLOSED category, the following paramet
 | ------------------------------ | ------------------------------------------------------------ |-------|
 | framework		| The machine learning framework		|Pytorch for 3D U-Net |
 | **Dataset params**		|								|   |
-| dataset.format       | Format of the dataset  		        | .npz for 3D U-Net |
-| dataset.num_samples_per_file       | Number of samples per file(only for Tensorflow using tfrecord datasets)  		        | 1 for 3D U-Net |
+| dataset.format       | Format of the dataset                          | .npz for 3D U-Net |
+| dataset.num_samples_per_file       | Number of samples per file(only for Tensorflow using tfrecord datasets)                          | 1 for 3D U-Net |
 | **Reader params**		|
-| reader.data_loader       | Data loader type(Tensorflow or PyTorch or custom) 		        | PyTorch for 3D U-Net |
+| reader.data_loader       | Data loader type(Tensorflow or PyTorch or custom)                  | PyTorch for 3D U-Net |
 
 
 ## Submission Rules

@@ -1,3 +1,3 @@
 # VERSION
-VERSION = "2.0.0b1"
-__version__ = VERSION
+VERSION = "2.1.0"
+__version__ = VERSION