System Overview

TRTorch is primarily a C++ Library with a Python API planned. We use Bazel as our build system and target Linux x86_64 and Linux aarch64 (only natively) right now. The compiler we use is GCC 7.5.0 and the library is untested with compilers before that version so there may be compilation errors if you try to use an older compiler.

The repository is structured into:

  • core: Main compiler source code

  • cpp: C++ API

  • tests: tests of the C++ API, the core and converters

  • docs: Documentation

  • docsrc: Documentation Source

  • third_party: BUILD files for dependency libraries

The C++ API is unstable and subject to change until the library matures, though most work is done under the hood in the core.

The core has a couple major parts: The top level compiler interface which coordinates ingesting a module, lowering, converting and generating a new module and returning it back to the user. The there are the three main phases of the compiler, the lowering phase, the conversion phase, and the execution phase.

Compiler Phases

Lowering

Lowering Phase

The lowering is made up of a set of passes (some from PyTorch and some specific to TRTorch) run over the graph IR to map the large PyTorch opset to a reduced opset that is easier to convert to TensorRT.

Conversion

Conversion Phase

In the conversion phase we traverse the lowered graph and construct an equivalent TensorRT graph. The conversion phase is made up of three main components, a context to manage compile time data, a evaluator library which will execute operations that can be resolved at compile time and a converter library which maps an op from JIT to TensorRT.

Execution

Execution Phase

The execution phase constructs a TorchScript program to run the converted TensorRT engine. It takes a serialized engine and instantiates it within a engine manager, then the compiler will build out a JIT graph that references this engine and wraps it in a module to return to the user. When the user executes the module, the JIT program will look up the engine and pass the inputs to it, then return the results.