Skip to content

Commit 6f38570

Browse files
ttyiorajeevsrao
authored andcommitted
TensorRT-OSS 8.2 GA release
Signed-off-by: Rajeev Rao <[email protected]>
1 parent 9ec6eb6 commit 6f38570

File tree

184 files changed

+5739
-1152
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

184 files changed

+5739
-1152
lines changed

CHANGELOG.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,63 @@
11
# TensorRT OSS Release Changelog
22

3+
## [8.2.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-1) - 2021-11-24
4+
5+
TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
6+
- Updates since [TensorRT 8.2.0 EA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.2.0-EA).
7+
- Please refer to the [TensorRT 8.2.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-1) for more information.
8+
9+
- ONNX parser [v8.2.1](https://github.com/onnx/onnx-tensorrt/releases/tag/release%2F8.2-GA)
10+
- Removed duplicate constant layer checks that caused some performance regressions
11+
- Fixed expand dynamic shape calculations
12+
- Added parser-side checks for `Scatter` layer support
13+
14+
- Sample updates
15+
- Added [Tensorflow Object Detection API converter samples](samples/python/tensorflow_object_detection_api), including Single Shot Detector, Faster R-CNN and Mask R-CNN models
16+
- Multiple enhancements in HuggingFace transformer demos
17+
- Added multi-batch support
18+
- Fixed resultant performance regression in batchsize=1
19+
- Fixed T5 large/T5-3B accuracy issues
20+
- Added [notebooks](demo/HuggingFace/notebooks) for T5 and GPT-2
21+
- Added CPU benchmarking option
22+
- Deprecated `kSTRICT_TYPES` (strict type constraints). Equivalent behaviour now achieved by setting `PREFER_PRECISION_CONSTRAINTS`, `DIRECT_IO`, and `REJECT_EMPTY_ALGORITHMS`
23+
- Removed `sampleMovieLens`
24+
- Renamed sampleReformatFreeIO to sampleIOFormats
25+
- Add `idleTime` option for samples to control qps
26+
- Specify default value for `precisionConstraints`
27+
- Fixed reporting of TensorRT build version in trtexec
28+
- Fixed `combineDescriptions` typo in trtexec/tracer.py
29+
- Fixed usages of of `kDIRECT_IO`
30+
31+
- Plugin updates
32+
- `EfficientNMS` plugin support extended to TF-TRT, and for clang builds.
33+
- Sanitize header definitions for BERT fused MHA plugin
34+
- Separate C++ and cu files in `splitPlugin` to avoid PTX generation (required for CUDA enhanced compatibility support)
35+
- Enable C++14 build for plugins
36+
37+
- ONNX tooling updates
38+
- [onnx-graphsurgeon](tools/onnx-graphsurgeon/CHANGELOG.md) upgraded to v0.3.14
39+
- [Polygraphy](tools/Polygraphy/CHANGELOG.md) upgraded to v0.33.2
40+
- [pytorch-quantization](tools/pytorch-quantization) toolkit upgraded to v2.1.2
41+
42+
- Build and container fixes
43+
- Add `SM86` target to default `GPU_ARCHS` for platforms with cuda-11.1+
44+
- Remove deprecated `SM_35` and add `SM_60` to default `GPU_ARCHS`
45+
- Skip CUB builds for cuda 11.0+ [#1455](https://github.com/NVIDIA/TensorRT/pull/1455)
46+
- Fixed cuda-10.2 container build failures in Ubuntu 20.04
47+
- Add native ARM server build container
48+
- Install devtoolset-8 for updated g++ version in CentOS7
49+
- Added a note on supporting c++14 builds for CentOS7
50+
- Fixed docker build for large UIDs [#1373](https://github.com/NVIDIA/TensorRT/issues/1373)
51+
- Updated README instructions for Jetpack builds
52+
53+
- demo enhancements
54+
- Updated Tacotron2 instructions and add CPU benchmarking
55+
- Fixed issues in demoBERT python notebook
56+
57+
- Documentation updates
58+
- Updated Python documentation for `add_reduce`, `add_top_k`, and `ISoftMaxLayer`
59+
- Renamed default GitHub branch to `main` and updated hyperlinks
60+
361
## [8.2.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-0-EA) - 2021-10-05
462
### Added
563
- [Demo applications](demo/HuggingFace) showcasing TensorRT inference of [HuggingFace Transformers](https://huggingface.co/transformers).

CMakeLists.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -141,8 +141,8 @@ if (DEFINED GPU_ARCHS)
141141
separate_arguments(GPU_ARCHS)
142142
else()
143143
list(APPEND GPU_ARCHS
144-
35
145144
53
145+
60
146146
61
147147
70
148148
75
@@ -157,8 +157,9 @@ else()
157157
if (CUDA_VERSION VERSION_GREATER_EQUAL 11.0)
158158
# Ampere GPU (SM80) support is only available in CUDA versions > 11.0
159159
list(APPEND GPU_ARCHS 80)
160-
else()
161-
message(WARNING "Detected CUDA version is < 11.0. SM80 not supported.")
160+
endif()
161+
if (CUDA_VERSION VERSION_GREATER_EQUAL 11.1)
162+
list(APPEND GPU_ARCHS 86)
162163
endif()
163164

164165
message(STATUS "GPU_ARCHS is not defined. Generating CUDA code for default SMs: ${GPU_ARCHS}")

README.md

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This repository contains the Open Source Software (OSS) components of NVIDIA Ten
1515
To build the TensorRT-OSS components, you will first need the following software packages.
1616

1717
**TensorRT GA build**
18-
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.2.0.6
18+
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.2.1.8
1919

2020
**System Packages**
2121
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
@@ -70,16 +70,16 @@ To build the TensorRT-OSS components, you will first need the following software
7070

7171
```bash
7272
cd ~/Downloads
73-
tar -xvzf TensorRT-8.2.0.6.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
74-
export TRT_LIBPATH=`pwd`/TensorRT-8.2.0.6
73+
tar -xvzf TensorRT-8.2.1.8.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
74+
export TRT_LIBPATH=`pwd`/TensorRT-8.2.1.8
7575
```
7676

7777
**Example: Windows on x86-64 with cuda-11.4**
7878

7979
```powershell
8080
cd ~\Downloads
81-
Expand-Archive .\TensorRT-8.2.0.6.Windows10.x86_64.cuda-11.4.cudnn8.2.zip
82-
$Env:TRT_LIBPATH = '$(Get-Location)\TensorRT-8.2.0.6'
81+
Expand-Archive .\TensorRT-8.2.1.8.Windows10.x86_64.cuda-11.4.cudnn8.2.zip
82+
$Env:TRT_LIBPATH = '$(Get-Location)\TensorRT-8.2.1.8'
8383
$Env:PATH += 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\MSBuild\15.0\Bin\'
8484
```
8585

@@ -110,6 +110,10 @@ For Linux platforms, we recommend that you generate a docker container for build
110110
```bash
111111
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda10.2 --cuda 10.2
112112
```
113+
**Example: Ubuntu 20.04 on aarch64 with cuda-11.4.2**
114+
```bash
115+
./docker/build.sh --file docker/ubuntu-20.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu20.04-cuda11.4
116+
```
113117
114118
2. #### Launch the TensorRT-OSS build container.
115119
**Example: Ubuntu 18.04 build container**
@@ -132,6 +136,23 @@ For Linux platforms, we recommend that you generate a docker container for build
132136
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out
133137
make -j$(nproc)
134138
```
139+
140+
> NOTE: On CentOS7, the default g++ version does not support C++14. For native builds (not using the CentOS7 build container), first install devtoolset-8 to obtain the updated g++ toolchain as follows:
141+
```bash
142+
yum -y install centos-release-scl
143+
yum-config-manager --enable rhel-server-rhscl-7-rpms
144+
yum -y install devtoolset-8
145+
export PATH="/opt/rh/devtoolset-8/root/bin:${PATH}
146+
```
147+
148+
**Example: Linux (aarch64) build with default cuda-11.4.2**
149+
```bash
150+
cd $TRT_OSSPATH
151+
mkdir -p build && cd build
152+
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64-native.toolchain
153+
make -j$(nproc)
154+
```
155+
135156
**Example: Native build on Jetson (aarch64) with cuda-10.2**
136157
```bash
137158
cd $TRT_OSSPATH
@@ -141,13 +162,15 @@ For Linux platforms, we recommend that you generate a docker container for build
141162
```
142163
> NOTE: C compiler must be explicitly specified via `CC=` for native `aarch64` builds of protobuf.
143164
144-
**Example: Ubuntu 18.04 Cross-Compile for Jetson (arm64) with cuda-10.2 (JetPack)**
165+
**Example: Ubuntu 18.04 Cross-Compile for Jetson (aarch64) with cuda-10.2 (JetPack)**
145166
```bash
146167
cd $TRT_OSSPATH
147168
mkdir -p build && cd build
148-
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=10.2
169+
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=10.2 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-10.2/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-10.2/targets/aarch64-linux/lib/stubs/libcublasLt.so
149170
make -j$(nproc)
150171
```
172+
> NOTE: The latest JetPack SDK v4.6 only supports TensorRT 8.0.1.
173+
151174
**Example: Windows (x86-64) build in Powershell**
152175
```powershell
153176
cd $Env:TRT_OSSPATH
@@ -191,4 +214,4 @@ For Linux platforms, we recommend that you generate a docker container for build
191214
192215
## Known Issues
193216
194-
* None
217+
* Please refer to [TensorRT 8.2 Release Notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#tensorrt-8)

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
8.2.0.6
1+
8.2.1.8
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#
2+
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
#
16+
17+
set(CMAKE_SYSTEM_NAME Linux)
18+
set(CMAKE_SYSTEM_PROCESSOR aarch64)
19+
20+
set(TRT_PLATFORM_ID "aarch64")
21+
22+
set(CUDA_PLATFORM_ID "sbsa-linux")
23+
24+
set(CMAKE_C_COMPILER /usr/bin/aarch64-linux-gnu-gcc)
25+
set(CMAKE_CXX_COMPILER /usr/bin/aarch64-linux-gnu-g++)
26+
27+
set(CMAKE_C_FLAGS "" CACHE STRING "" FORCE)
28+
set(CMAKE_CXX_FLAGS "" CACHE STRING "" FORCE)
29+
30+
set(CMAKE_C_COMPILER_TARGET aarch64-linux-gnu)
31+
set(CMAKE_CXX_COMPILER_TARGET aarch64-linux-gnu)
32+
33+
set(CMAKE_C_COMPILER_FORCED TRUE)
34+
set(CMAKE_CXX_COMPILER_FORCED TRUE)
35+
36+
set(CUDA_TOOLKIT_ROOT_DIR /usr/local/cuda/targets/${CUDA_PLATFORM_ID} CACHE STRING "CUDA ROOT dir")
37+
set(CUDA_INCLUDE_DIRS ${CUDA_TOOLKIT_ROOT_DIR}/include)

demo/BERT/notebooks/BERT-TRT-FP16.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@
171171
"outputs": [],
172172
"source": [
173173
"# Build BERT TensorRT FP16 model from NGC checkpoint\n",
174-
"!python3 ../builder.py -m models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_384_v19.03.1/model.ckpt -w 40000 -o engines_$TRT_VERSION/bert_large_384.engine -b $BATCH_SIZE -s 384 --fp16 -c models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_384_v19.03.1"
174+
"!python3 ../builder.py -m models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_384_v19.03.1/model.ckpt -w 40000 -o engines_$TRT_VERSION/bert_large_384.engine -b 1 -b $BATCH_SIZE -s 384 --fp16 -c models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_384_v19.03.1"
175175
]
176176
},
177177
{
@@ -333,7 +333,7 @@
333333
"metadata": {},
334334
"outputs": [],
335335
"source": [
336-
"!python3 ../builder.py -m models/fine-tuned/bert_tf_ckpt_base_qa_squad2_amp_128_v19.03.1/model.ckpt -w 40000 -o engines_$TRT_VERSION/bert_base_128.engine -b $BATCH_SIZE -s 128 --fp16 -c models/fine-tuned/bert_tf_ckpt_base_qa_squad2_amp_128_v19.03.1"
336+
"!python3 ../builder.py -m models/fine-tuned/bert_tf_ckpt_base_qa_squad2_amp_128_v19.03.1/model.ckpt -w 40000 -o engines_$TRT_VERSION/bert_base_128.engine -b 1 -b $BATCH_SIZE -s 128 --fp16 -c models/fine-tuned/bert_tf_ckpt_base_qa_squad2_amp_128_v19.03.1"
337337
]
338338
},
339339
{

demo/BERT/perf.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,11 +79,20 @@ def main():
7979
bench_times = {}
8080

8181
stream = cuda.Stream()
82-
for idx, batch_size in enumerate(sorted(args.batch_size)):
83-
context.set_optimization_profile_async(idx, stream.handle)
82+
for batch_size in sorted(args.batch_size):
83+
# Select engine profile
84+
selected_profile = -1
85+
for idx in range(engine.num_optimization_profiles):
86+
profile_shape = engine.get_profile_shape(profile_index = idx, binding = idx * num_binding_per_profile)
87+
if profile_shape[0][0] <= batch_size and profile_shape[2][0] >= batch_size and profile_shape[0][1] <= args.sequence_length and profile_shape[2][1] >= args.sequence_length:
88+
selected_profile = idx
89+
break
90+
if selected_profile == -1:
91+
raise RuntimeError("None of the dynamic shape profiles meets the requirement batch = {} and sequence = {}.".format(batch_size, args.sequence_length))
92+
context.set_optimization_profile_async(selected_profile, stream.handle)
8493

8594
# Each profile has unique bindings
86-
binding_idx_offset = idx * num_binding_per_profile
95+
binding_idx_offset = selected_profile * num_binding_per_profile
8796
bindings = [0] * binding_idx_offset + [buf.binding() for buf in buffers]
8897

8998
shapes = {

demo/HuggingFace/GPT2/export.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -78,10 +78,10 @@ def __init__(self, model, network_metadata):
7878

7979
# TRT Engine File Encoding #
8080
class GPT2TRTEngine(TRTEngineFile):
81-
def __init__(self, model, network_metadata):
82-
super().__init__(model, GPT2Converter, network_metadata)
81+
def __init__(self, model, network_metadata, batch_size = 1):
82+
super().__init__(model, GPT2Converter, network_metadata, batch_size = batch_size)
8383

84-
def use_strict_types(self):
84+
def use_obey_precision_constraints(self):
8585
return self.network_metadata.precision.fp16
8686

8787
def get_dynamic_shape_profiles(self):
@@ -91,9 +91,9 @@ def get_dynamic_shape_profiles(self):
9191
profile = Profile()
9292
profile.add(
9393
"input_ids",
94-
min=(1, 1),
95-
opt=(1, max_sequence_length // 2),
96-
max=(1, max_sequence_length),
94+
min=(self.batch_size, 1),
95+
opt=(self.batch_size, max_sequence_length // 2),
96+
max=(self.batch_size, max_sequence_length),
9797
)
9898
return [profile]
9999

demo/HuggingFace/GPT2/frameworks.py

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -155,11 +155,17 @@ def execute_inference(
155155
network_fpaths: NetworkModels,
156156
inference_input: str,
157157
timing_profile: TimingProfile,
158+
use_cpu: bool,
159+
batch_size: int = 1
158160
) -> NetworkResult:
159161

160162
# Execute some tests
161163
tokenizer = GPT2Tokenizer.from_pretrained(metadata.variant)
162-
input_ids = tokenizer(inference_input, return_tensors="pt").input_ids
164+
165+
# GPT2 has no proper token set. Use custom token. Only "generate()" will auto
166+
# replace with EOS token when using generating mode
167+
tokenizer.add_special_tokens({"pad_token": "[PAD]"})
168+
input_ids = tokenizer([inference_input] * batch_size, padding=True, return_tensors="pt").input_ids
163169

164170
# By default, HuggingFace model structure is one giant file.
165171
gpt2_torch_fpath = network_fpaths.torch[0].fpath
@@ -172,7 +178,7 @@ def execute_inference(
172178

173179
# get single decoder iteration inference timing profile
174180
_, decoder_e2e_median_time = gpt2_inference(
175-
gpt2_torch, input_ids, timing_profile
181+
gpt2_torch, input_ids, timing_profile, use_cuda=(not use_cpu)
176182
)
177183

178184
# get complete decoder inference result and its timing profile
@@ -181,13 +187,17 @@ def execute_inference(
181187
input_ids,
182188
timing_profile,
183189
max_length=GPT2ModelTRTConfig.MAX_SEQUENCE_LENGTH[metadata.variant],
190+
use_cuda=(not use_cpu),
191+
batch_size=batch_size
184192
)
185193

186-
semantic_outputs = []
187-
for i, sample_output in enumerate(sample_output):
188-
semantic_outputs.append(
189-
tokenizer.decode(sample_output, skip_special_tokens=True)
190-
)
194+
# Remove the padding and end tokens.
195+
semantic_outputs = tokenizer.decode(
196+
sample_output[-1, :], skip_special_tokens=True
197+
)
198+
199+
if isinstance(semantic_outputs, list):
200+
semantic_outputs = " ".join(semantic_outputs).strip()
191201

192202
return NetworkResult(
193203
input=inference_input,
@@ -214,6 +224,8 @@ def run_framework(
214224
keep_onnx_model: bool,
215225
keep_pytorch_model: bool,
216226
timing_profile: TimingProfile,
227+
use_cpu: bool = False,
228+
batch_size: int = 1
217229
) -> List[NetworkResult]:
218230
"""
219231
Main entry point of our function which compiles and generates our model data.
@@ -227,7 +239,7 @@ def run_framework(
227239
for ninput in network_input:
228240
results.append(
229241
self.execute_inference(
230-
metadata, network_fpaths, ninput, timing_profile
242+
metadata, network_fpaths, ninput, timing_profile, use_cpu
231243
)
232244
)
233245
finally:

demo/HuggingFace/GPT2/measurements.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
# TRT-HuggingFace
2525
from NNDF.general_utils import measure_python_inference_code
2626
from NNDF.torch_utils import use_cuda
27+
from NNDF.tensorrt_utils import TRTNativeRunner
2728

2829

2930
@use_cuda
@@ -37,9 +38,13 @@ def gpt2_inference(gpt2, input_ids, timing_profile, use_cuda=True):
3738

3839
# Code specifically for Pythonic inference measurement used across all GPT2 related scripts
3940
@use_cuda
40-
def full_inference_greedy(gpt2, input_ids, timing_profile, max_length, use_cuda=True):
41+
def full_inference_greedy(gpt2, input_ids, timing_profile, max_length, use_cuda=True, batch_size=1):
42+
43+
if isinstance(gpt2, TRTNativeRunner):
44+
gpt2.set_return_device("cuda" if use_cuda else "cpu")
45+
4146
def _e2e():
42-
return gpt2.generate(input_ids, max_length=max_length) # greedy search
47+
return gpt2.generate(input_ids, max_length=max_length, batch_size=batch_size) # greedy search
4348

4449
full_e2e_median_time = measure_python_inference_code(
4550
_e2e,

0 commit comments

Comments
 (0)