Execution Phase

The execution phase is responsible for managing TensorRT engines, constructing a new module for the TensorRT engines, and acting as a runtime for JIT modules calling TensorRT engines. The main interface accepts a serialized TensorRT engine. It stands up the engine within the Engine Manager which maintains a execution context for each engine and some metadata about its inputs and outputs. Each engine is assigned an ID which can be used to reference the engine when running the a module with the JIT interpreter.

Background

PyTorch JIT’s runtime is based around a stack machine, all operators pop off arguments from the stack, pass them to some implementation of the operator then push results back onto the stack. The actual elements of the stack are torch::jit::IValues , the same type we evaluate in the conversion phase (the realization of the abstract torch::jit::Value type).

TensorRT Engine Executor Op

When the TRTorch is loaded, it registers an operator in the PyTorch JIT operator library called trt::execute_engine(int id, ...) -> ... which takes a engine ID and inputs. It will then use the ID to look up the coresponding execution context, then pop off the inputs from the runtime stack. These inputs are passed into a generic engine execution function which will run the tensors through the TensorRT engine and return new tensors as results. These tensors are pushed on to the stack so that the next op whatever it is can use it.

Constructing the Resulting Graph

Once the engine is registered, the compiler will construct a graph that will execute the engine when the module is called. Here is an example:

graph(%self.1 : __torch__.___torch_mangle_10.LeNet_trt,
    %2 : Tensor):
    %1 : int = prim::Constant[value=94106001690080]()
    %3 : Tensor = trt::execute_engine(%1, %2)
    return (%3)
(AddEngineToGraph)

You can see the ID as a constant in the graph and the trt::execute_engine op taking the constant and an input tensor in and produces an output tensor which is returned. When forward is called on the module this graph is executed, thereby running the TensorRT engine.