Lowering Phase ¶
The lowering phase is made up out of passes which are operations which map a graph from a high level representation to a lower level one. Each pass does something specific for instance inlining method calls. The idea is to significantly reduce what the conversion phase needs to be able to handle when actually mapping to TensorRT. We aim for closer to 1->1 op conversion vs looking for applicable subgraphs, limiting the number of converters and reduce the scope of each converter.
You can see the effects of each pass by setting the log level to
Level::kGraph
Passes Used ¶
Eliminate Dead Code ¶
Dead code elimination will check if a node has side effects and not delete it if it does.
Eliminate Exeception Or Pass Pattern ¶
A common pattern in scripted modules are dimension gaurds which will throw execptions if the input dimension is not what was expected.
%1013 : bool = aten::ne(%1012, %24) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py:248:11
= prim::If(%1013) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py:248:8
block0():
= prim::RaiseException(%23) # ~/.local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py:249:12
-> ()
block1():
-> ()
Since we are resolving all of this at compile time and there are no execptions in the TensorRT graph, we just remove it.
Eliminate Redundant Gaurds ¶
Eliminate redundant guards for ops whose outputs are fully determined by their inputs i.e. if inputs to such ops are guarded we are allowed to remove a guard on ops’ outputs
Freeze Module ¶
Freeze attributes and inline constants and modules. Propogates constants in the graph.
Fuse Linear ¶
Match the
aten::linear
pattern and fuse it into a single
aten::linear
This pass fuse the addmm or matmul + add generated by JIT back to linear
Fuse Flatten Linear ¶
TensorRT implicity flattens input layers into fully connected layers when they are higher than 1D. So when there is a
aten::flatten
->
aten::linear
pattern we remove the
aten::flatten
.
Lower Graph ¶
Given a graph with of a method which first argument is %self, lower it to a graph where all attributes accesses are replaced with explicit inputs of the graph (rather than results of prim::GetAttr executed on %self). Returns a tuple (graph, parameters) where the last module.parameters.size() inputs to the graph are the trainable parameters used in this method. The remaining inputs are the true inputs to the function.
Remove Dropout ¶
Removes dropout operators since we are doing inference.
Unpack AddMM ¶
Unpacks
aten::addmm
into
aten::matmul
and
aten::add_
(with an additional
trt::const
op to freeze the bias in the TensorRT graph). This lets us reuse the
aten::matmul
and
aten::add_
converters instead of needing a dedicated converter.
Unpack LogSoftmax ¶
Unpacks
aten::logsoftmax
into
aten::softmax
and
aten::log
. This lets us reuse the
aten::softmax
and
aten::log
converters instead of needing a dedicated converter.