Lowering Phase

The lowering phase is made up out of passes which are operations which map a graph from a high level representation to a lower level one. Each pass does something specific for instance inlining method calls. The idea is to significantly reduce what the conversion phase needs to be able to handle when actually mapping to TensorRT. We aim for closer to 1->1 op conversion vs looking for applicable subgraphs, limiting the number of converters and reduce the scope of each converter.

You can see the effects of each pass by setting the log level to Level::kGraph

Passes Used

Eliminate Dead Code

Dead code elimination will check if a node has side effects and not delete it if it does.

Eliminate Exeception Or Pass Pattern

A common pattern in scripted modules are dimension gaurds which will throw execptions if the input dimension is not what was expected.

%1013 : bool = aten::ne(%1012, %24) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py:248:11
    = prim::If(%1013) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py:248:8
    block0():
        = prim::RaiseException(%23) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py:249:12
    -> ()
    block1():
    -> ()

Since we are resolving all of this at compile time and there are no execptions in the TensorRT graph, we just remove it.

Eliminate Redundant Gaurds

Eliminate redundant guards for ops whose outputs are fully determined by their inputs i.e. if inputs to such ops are guarded we are allowed to remove a guard on ops’ outputs

Freeze Module

Freeze attributes and inline constants and modules. Propogates constants in the graph.

Fuse Linear

Match the aten::linear pattern and fuse it into a single aten::linear This pass fuse the addmm or matmul + add generated by JIT back to linear

Fuse Flatten Linear

TensorRT implicity flattens input layers into fully connected layers when they are higher than 1D. So when there is a aten::flatten -> aten::linear pattern we remove the aten::flatten .

Lower Graph

Given a graph with of a method which first argument is %self, lower it to a graph where all attributes accesses are replaced with explicit inputs of the graph (rather than results of prim::GetAttr executed on %self). Returns a tuple (graph, parameters) where the last module.parameters.size() inputs to the graph are the trainable parameters used in this method. The remaining inputs are the true inputs to the function.

Remove Dropout

Removes dropout operators since we are doing inference.

Unpack AddMM

Unpacks aten::addmm into aten::matmul and aten::add_ (with an additional trt::const op to freeze the bias in the TensorRT graph). This lets us reuse the aten::matmul and aten::add_ converters instead of needing a dedicated converter.

Unpack LogSoftmax

Unpacks aten::logsoftmax into aten::softmax and aten::log . This lets us reuse the aten::softmax and aten::log converters instead of needing a dedicated converter.