TensorRTx aims to implement popular deep learning networks with TensorRT network definition API.
Why don't we use a parser (ONNX parser, UFF parser, caffe parser, etc), but use complex APIs to build a network from scratch? I have summarized the advantages in the following aspects.
- Flexible, easy to modify the network, add/delete a layer or input/output tensor, replace a layer, merge layers, integrate preprocessing and postprocessing into network, etc.
- Debuggable, construct the entire network in an incremental development manner, easy to get middle layer results.
- Educational, learn about the network structure during this development, rather than treating everything as a black box.
The basic workflow of TensorRTx is:
- Get the trained models from pytorch, mxnet or tensorflow, etc. Some pytorch models can be found in my repo pytorchx, the remaining are from popular open-source repos.
- Export the weights to a plain text file -- .wts file.
- Load weights in TensorRT, define the network, build a TensorRT engine.
- Load the TensorRT engine and run inference.
- 10 May 2025. pranavm-nvidia: YOLO11 writen in Tripy.
- 2 May 2025. fazligorkembal: YOLO12
- 12 Apr 2025. pranavm-nvidia: First Lenet example writen in Tripy.
- 11 Apr 2025. mpj1234: YOLO11-obb
- 22 Oct 2024. lindsayshuo: YOLOv8-obb
- 18 Oct 2024. zgjja: Rafactor docker image.
- 11 Oct 2024. mpj1234: YOLO11
- 9 Oct 2024. Phoenix8215: GhostNet V1 and V2.
- 21 Aug 2024. Lemonononon: real-esrgan-general-x4v3
- 29 Jul 2024. mpj1234: Check the YOLOv5, YOLOv8 & YOLOv10 in TensorRT 10.x API, branch → trt10
- 29 Jul 2024. mpj1234: YOLOv10
- 21 Jun 2024. WuxinrongY: YOLOv9-T, YOLOv9-S, YOLOv9-M
- 28 Apr 2024. lindsayshuo: YOLOv8-pose
- 22 Apr 2024. B1SH0PP: EfficientAd: Accurate Visual Anomaly Detection at Millisecond-Level Latencies.
- 18 Apr 2024. lindsayshuo: YOLOv8-p2
- How to make contribution
- Install the dependencies.
- A guide for quickly getting started, taking lenet5 as a demo.
- The .wts file content format
- Frequently Asked Questions (FAQ)
- Migrating from TensorRT 4 to 7
- How to implement multi-GPU processing, taking YOLOv4 as example
- Check if Your GPU support FP16/INT8
- How to Compile and Run on Windows
- Deploy YOLOv4 with Triton Inference Server
- From pytorch to trt step by step, hrnet as example(Chinese)
- TensorRT 7.x
- TensorRT 8.x(Some of the models support 8.x)
Each folder has a readme inside, which explains how to run the models inside.
Following models are implemented.
| Name | Description | 
|---|---|
| mlp | the very basic model for starters, properly documented | 
| lenet | the simplest, as a "hello world" of this project | 
| alexnet | easy to implement, all layers are supported in tensorrt | 
| googlenet | GoogLeNet (Inception v1) | 
| inception | Inception v3, v4 | 
| mnasnet | MNASNet with depth multiplier of 0.5 from the paper | 
| mobilenet | MobileNet v2, v3-small, v3-large | 
| resnet | resnet-18, resnet-50 and resnext50-32x4d are implemented | 
| senet | se-resnet50 | 
| shufflenet | ShuffleNet v2 with 0.5x output channels | 
| squeezenet | SqueezeNet 1.1 model | 
| vgg | VGG 11-layer model | 
| yolov3-tiny | weights and pytorch implementation from ultralytics/yolov3 | 
| yolov3 | darknet-53, weights and pytorch implementation from ultralytics/yolov3 | 
| yolov3-spp | darknet-53, weights and pytorch implementation from ultralytics/yolov3 | 
| yolov4 | CSPDarknet53, weights from AlexeyAB/darknet, pytorch implementation from ultralytics/yolov3 | 
| yolov5 | yolov5 v1.0-v7.0 of ultralytics/yolov5, detection, classification and instance segmentation | 
| yolov7 | yolov7 v0.1, pytorch implementation from WongKinYiu/yolov7 | 
| yolov8 | yolov8, pytorch implementation from ultralytics | 
| yolov9 | The Pytorch implementation is WongKinYiu/yolov9. | 
| yolov10 | The Pytorch implementation is THU-MIG/yolov10. | 
| yolo11 | The Pytorch implementation is ultralytics. | 
| yolo12 | The Pytorch implementation is ultralytics. | 
| yolop | yolop, pytorch implementation from hustvl/YOLOP | 
| retinaface | resnet50 and mobilnet0.25, weights from biubug6/Pytorch_Retinaface | 
| arcface | LResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights from deepinsight/insightface | 
| retinafaceAntiCov | mobilenet0.25, weights from deepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute | 
| dbnet | Scene Text Detection, weights from BaofengZan/DBNet.pytorch | 
| crnn | pytorch implementation from meijieru/crnn.pytorch | 
| ufld | pytorch implementation from Ultra-Fast-Lane-Detection, ECCV2020 | 
| hrnet | hrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation from HRNet-Image-Classification and HRNet-Semantic-Segmentation | 
| psenet | PSENet Text Detection, tensorflow implementation from liuheng92/tensorflow_PSENet | 
| ibnnet | IBN-Net, pytorch implementation from XingangPan/IBN-Net, ECCV2018 | 
| unet | U-Net, pytorch implementation from milesial/Pytorch-UNet | 
| repvgg | RepVGG, pytorch implementation from DingXiaoH/RepVGG | 
| lprnet | LPRNet, pytorch implementation from xuexingyu24/License_Plate_Detection_Pytorch | 
| refinedet | RefineDet, pytorch implementation from luuuyi/RefineDet.PyTorch | 
| densenet | DenseNet-121, from torchvision.models | 
| rcnn | FasterRCNN and MaskRCNN, model from detectron2 | 
| tsm | TSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019 | 
| scaled-yolov4 | yolov4-csp, pytorch from WongKinYiu/ScaledYOLOv4 | 
| centernet | CenterNet DLA-34, pytorch from xingyizhou/CenterNet | 
| efficientnet | EfficientNet b0-b8 and l2, pytorch from lukemelas/EfficientNet-PyTorch | 
| detr | DE⫶TR, pytorch from facebookresearch/detr | 
| swin-transformer | Swin Transformer - Semantic Segmentation, only support Swin-T. The Pytorch implementation is microsoft/Swin-Transformer | 
| real-esrgan | Real-ESRGAN. The Pytorch implementation is real-esrgan | 
| superpoint | SuperPoint. The Pytorch model is from magicleap/SuperPointPretrainedNetwork | 
| csrnet | CSRNet. The Pytorch implementation is leeyeehoo/CSRNet-pytorch | 
| EfficientAd | EfficientAd: Accurate Visual Anomaly Detection at Millisecond-Level Latencies. From anomalib | 
The .wts files can be downloaded from model zoo for quick evaluation. But it is recommended to convert .wts from pytorch/mxnet/tensorflow model, so that you can retrain your own model.
GoogleDrive | BaiduPan pwd: uvv2
Some tricky operations encountered in these models, already solved, but might have better solutions.
| Name | Description | 
|---|---|
| BatchNorm | Implement by a scale layer, used in resnet, googlenet, mobilenet, etc. | 
| MaxPool2d(ceil_mode=True) | use a padding layer before maxpool to solve ceil_mode=True, see googlenet. | 
| average pool with padding | use setAverageCountExcludesPadding() when necessary, see inception. | 
| relu6 | use Relu6(x) = Relu(x) - Relu(x-6), see mobilenet. | 
| torch.chunk() | implement the 'chunk(2, dim=C)' by tensorrt plugin, see shufflenet. | 
| channel shuffle | use two shuffle layers to implement channel_shuffle, see shufflenet. | 
| adaptive pool | use fixed input dimension, and use regular average pooling, see shufflenet. | 
| leaky relu | I wrote a leaky relu plugin, but PRelu in NvInferPlugin.hcan be used, see yolov3 in branchtrt4. | 
| yolo layer v1 | yolo layer is implemented as a plugin, see yolov3 in branch trt4. | 
| yolo layer v2 | three yolo layers implemented in one plugin, see yolov3-spp. | 
| upsample | replaced by a deconvolution layer, see yolov3. | 
| hsigmoid | hard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3 | 
| retinaface output decode | implement a plugin to decode bbox, confidence and landmarks, see retinaface. | 
| mish | mish activation is implemented as a plugin, mish is used in yolov4 | 
| prelu | mxnet's prelu activation with trainable gamma is implemented as a plugin, used in arcface | 
| HardSwish | hard_swish = x * hard_sigmoid, used in yolov5 v3.0 | 
| LSTM | Implemented pytorch nn.LSTM() with tensorrt api | 
| Models | Device | BatchSize | Mode | Input Shape(HxW) | FPS | 
|---|---|---|---|---|---|
| YOLOv3-tiny | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 333 | 
| YOLOv3(darknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 39.2 | 
| YOLOv3(darknet53) | Xeon E5-2620/GTX1080 | 1 | INT8 | 608x608 | 71.4 | 
| YOLOv3-spp(darknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 38.5 | 
| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 35.7 | 
| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 4 | FP32 | 608x608 | 40.9 | 
| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 8 | FP32 | 608x608 | 41.3 | 
| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 142 | 
| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 4 | FP32 | 608x608 | 173 | 
| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 8 | FP32 | 608x608 | 190 | 
| YOLOv5-m v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 71 | 
| YOLOv5-l v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 43 | 
| YOLOv5-x v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 29 | 
| YOLOv5-s v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 142 | 
| YOLOv5-m v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 71 | 
| YOLOv5-l v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 40 | 
| YOLOv5-x v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 27 | 
| RetinaFace(resnet50) | Xeon E5-2620/GTX1080 | 1 | FP32 | 480x640 | 90 | 
| RetinaFace(resnet50) | Xeon E5-2620/GTX1080 | 1 | INT8 | 480x640 | 204 | 
| RetinaFace(mobilenet0.25) | Xeon E5-2620/GTX1080 | 1 | FP32 | 480x640 | 417 | 
| ArcFace(LResNet50E-IR) | Xeon E5-2620/GTX1080 | 1 | FP32 | 112x112 | 333 | 
| CRNN | Xeon E5-2620/GTX1080 | 1 | FP32 | 32x100 | 1000 | 
Help wanted, if you got speed results, please add an issue or PR.
Any contributions, questions and discussions are welcomed, contact me by following info.
E-mail: [email protected]
WeChat ID: wangxinyu0375 (可加我微信进tensorrtx交流群,备注:tensorrtx)