NeuSight is a framework designed to predict the performance of deep learning training and inference on various GPUs. For more details, please refer to our paper, Forecasting GPU Performance for Deep Learning Training and Inference.
To install NeuSight as a Python package, run:
git clone https://github.com/sitar-lab/NeuSight.git
cd NeuSight
pip install -e .- NeuSight was tested on Python 3.9 and PyTorch version 2.1.0.
We provide two Python scripts for predicting the latency of deep learning execution and training the predictor:
scripts/pred.pyfor making predictionsscripts/train.pyfor training the predictor
The scripts use GPU and deep learning description files in JSON format, and execution hyperparameters such as batch size. See the scripts in scripts/example for examples of how to use these scripts.
NeuSight requires two input files to run: a device configuration file and a deep learning model configuration file.
The device configuration file specifies the architectural parameters of the prediction target GPU. Example configuration files can be found in data/device_configs. The configuration file includes
Device: User-specified name of the deviceDev_Mem: Global memory in GBMem_Bw: Memory bandwidth of global memory in GB/sNum_Sm: Number of SMsCore_Per_SM: Number of CUDA cores per SMFreq: Compute frequency in GHzSingleFLOPs: Peak FP32 performance in GFLOPS/sL2Cache: Size of L2 Cache in MB
The deep learning model configuration file specifies the architectural parameters of the target deep learning model. Example configuration files can be found in data/DLmodel_configs. Configuration file are specified in Hugging Face model description format.
/ : NEUSIGHT_ROOT
|-- neusight : Source file directory for NeuSight
| |-- Dataset : For collecting and processing dataset for training
| |-- Model : For machine learning based predictor
| |-- Opgraph : For manipulating operator graphs
| |-- Prediction : For NeuSight predictor
| |-- Tracing : For tracing ML model graphs
|-- scripts : Main scripts for running NeuSight
| |-- asplos : Training dataset and scripts used for ASPLOS 2025 paper
| | |-- data : Input files used for NeuSight
| | | |-- dataset : Datasets used for training and tile table (NVIDIA)
| | | |-- dataset_amd : Datasets used for training and tile table (AMD)
| | | |-- device_configs : GPU description files
| | | |-- DLmodel_configs : DL model description files
| | | |-- predictor : Model configuration and trained parameters for ML predictor
| | |-- label : Measured latencies for ML models evaluated
| | |-- results : Results of NeuSight prediction
| | |-- summary : Summary of NeuSight prediction
| |-- example : Example scripts for running NeuSightIf you use NeuSight in your research, please cite our paper:
@inproceedings{neusight,
author = {Lee, Seonho and Phanishayee, Amar and Mahajan, Divya},
title = {Forecasting GPU Performance for Deep Learning Training and Inference},
year = {2025},
isbn = {9798400706981},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3669940.3707265},
doi = {10.1145/3669940.3707265},
booktitle = {Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1},
pages = {493–508},
numpages = {16},
keywords = {deep learning, gpu performance forecasting, ml for systems, training and inference},
location = {Rotterdam, Netherlands},
series = {ASPLOS '25}
}