TensorRTx aims to implement popular deep learning networks with TensorRT network definition API.
Why don't we use a parser (ONNX parser, UFF parser, caffe parser, etc), but use complex APIs to build a network from scratch? I have summarized the advantages in the following aspects.
- Flexible, easy to modify the network, add/delete a layer or input/output tensor, replace a layer, merge layers, integrate preprocessing and postprocessing into network, etc.
- Debuggable, construct the entire network in an incremental development manner, easy to get middle layer results.
- Chance to learn, learn about the network structure during this development, rather than treating everything as a black box.
The basic workflow of TensorRTx is:
- Get the trained models from pytorch, mxnet or tensorflow, etc. Some pytorch models can be found in my repo pytorchx, the remaining are from popular open-source repos.
- Export the weights to a plain text file -- .wts file.
- Load weights in TensorRT, define the network, build a TensorRT engine.
- Load the TensorRT engine and run inference.
17 Oct 2023
. Rex-LK: YOLOv8-Seg30 Jun 2023
. xiaocao-tian and lindsayshuo: YOLOv81 Mar 2023
. Nengwp: RCNN and UNet upgrade to support TensorRT 8.18 Dec 2022
. YOLOv5 upgrade to support v7.0, including instance segmentation.12 Dec 2022
. East-Face: UNet upgrade to support v3.0 of Pytorch-UNet.26 Oct 2022
. ausk: YoloP(You Only Look Once for Panopitic Driving Perception).19 Sep 2022
. QIANXUNZDL123 and lindsayshuo: YOLOv7.7 Sep 2022
. xiang-wuu: YOLOv5 v6.2 classification models.19 Aug 2022
. Dominic and sbmalik: Yolov3-tiny and Arcface support TRT8.6 Jul 2022
. xiang-wuu: SuperPoint - Self-Supervised Interest Point Detection and Description, vSLAM related.26 May 2022
. triple-Mu: YOLOv5 python script with CUDA Python API.23 May 2022
. yhpark: Real-ESRGAN, Practical Algorithms for General Image/Video Restoration.19 May 2022
. vjsrinivas: YOLOv3 TRT8 support and Python script.15 Mar 2022
. sky_hole: Swin Transformer - Semantic Segmentation.19 Oct 2021
. liuqi123123 added cuda preprossing for yolov5, preprocessing + inference is 3x faster when batchsize=8.
- Install the dependencies.
- A guide for quickly getting started, taking lenet5 as a demo.
- The .wts file content format
- Frequently Asked Questions (FAQ)
- Migrating from TensorRT 4 to 7
- How to implement multi-GPU processing, taking YOLOv4 as example
- Check if Your GPU support FP16/INT8
- How to Compile and Run on Windows
- Deploy YOLOv4 with Triton Inference Server
- From pytorch to trt step by step, hrnet as example(Chinese)
- TensorRT 7.x
- TensorRT 8.x(Some of the models support 8.x)
Each folder has a readme inside, which explains how to run the models inside.
Following models are implemented.
Name | Description |
---|---|
mlp | the very basic model for starters, properly documented |
lenet | the simplest, as a "hello world" of this project |
alexnet | easy to implement, all layers are supported in tensorrt |
googlenet | GoogLeNet (Inception v1) |
inception | Inception v3, v4 |
mnasnet | MNASNet with depth multiplier of 0.5 from the paper |
mobilenet | MobileNet v2, v3-small, v3-large |
resnet | resnet-18, resnet-50 and resnext50-32x4d are implemented |
senet | se-resnet50 |
shufflenet | ShuffleNet v2 with 0.5x output channels |
squeezenet | SqueezeNet 1.1 model |
vgg | VGG 11-layer model |
yolov3-tiny | weights and pytorch implementation from ultralytics/yolov3 |
yolov3 | darknet-53, weights and pytorch implementation from ultralytics/yolov3 |
yolov3-spp | darknet-53, weights and pytorch implementation from ultralytics/yolov3 |
yolov4 | CSPDarknet53, weights from AlexeyAB/darknet, pytorch implementation from ultralytics/yolov3 |
yolov5 | yolov5 v1.0-v7.0 of ultralytics/yolov5, detection, classification and instance segmentation |
yolov7 | yolov7 v0.1, pytorch implementation from WongKinYiu/yolov7 |
yolov8 | yolov8, pytorch implementation from ultralytics/ultralytics |
yolop | yolop, pytorch implementation from hustvl/YOLOP |
retinaface | resnet50 and mobilnet0.25, weights from biubug6/Pytorch_Retinaface |
arcface | LResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights from deepinsight/insightface |
retinafaceAntiCov | mobilenet0.25, weights from deepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute |
dbnet | Scene Text Detection, weights from BaofengZan/DBNet.pytorch |
crnn | pytorch implementation from meijieru/crnn.pytorch |
ufld | pytorch implementation from Ultra-Fast-Lane-Detection, ECCV2020 |
hrnet | hrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation from HRNet-Image-Classification and HRNet-Semantic-Segmentation |
psenet | PSENet Text Detection, tensorflow implementation from liuheng92/tensorflow_PSENet |
ibnnet | IBN-Net, pytorch implementation from XingangPan/IBN-Net, ECCV2018 |
unet | U-Net, pytorch implementation from milesial/Pytorch-UNet |
repvgg | RepVGG, pytorch implementation from DingXiaoH/RepVGG |
lprnet | LPRNet, pytorch implementation from xuexingyu24/License_Plate_Detection_Pytorch |
refinedet | RefineDet, pytorch implementation from luuuyi/RefineDet.PyTorch |
densenet | DenseNet-121, from torchvision.models |
rcnn | FasterRCNN and MaskRCNN, model from detectron2 |
tsm | TSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019 |
scaled-yolov4 | yolov4-csp, pytorch from WongKinYiu/ScaledYOLOv4 |
centernet | CenterNet DLA-34, pytorch from xingyizhou/CenterNet |
efficientnet | EfficientNet b0-b8 and l2, pytorch from lukemelas/EfficientNet-PyTorch |
detr | DE⫶TR, pytorch from facebookresearch/detr |
swin-transformer | Swin Transformer - Semantic Segmentation, only support Swin-T. The Pytorch implementation is microsoft/Swin-Transformer |
real-esrgan | Real-ESRGAN. The Pytorch implementation is real-esrgan |
superpoint | SuperPoint. The Pytorch model is from magicleap/SuperPointPretrainedNetwork |
The .wts files can be downloaded from model zoo for quick evaluation. But it is recommended to convert .wts from pytorch/mxnet/tensorflow model, so that you can retrain your own model.
GoogleDrive | BaiduPan pwd: uvv2