IntelĀ® Neural Compressor supports automatic quantization tuning flow by converting quantizable layers to INT8 and allowing users to control model accuracy and performance tradeoffs and implements the latest quantization algorithms from the research community.

Visit the IntelĀ® Neural Compressor online document website at: intel.github.io/neural-compressor.


Quantize from presets

Choose model from predefined quantization configurations.

or

Quantize using wizard

Create a new quantization configuration.