SuperGradients
Introduction
Welcome to SuperGradients, a free, open-source training library for PyTorch-based deep learning models. SuperGradients allows you to train or fine-tune SOTA pre-trained models for all the most commonly applied computer vision tasks with just one training library. We currently support object detection, image classification and semantic segmentation for videos and images.
Docs and full user guide
Why use SuperGradients?
Built-in SOTA Models
Easily load and fine-tune production-ready, pre-trained SOTA models that incorporate best practices and validated hyper-parameters for achieving best-in-class accuracy.
Easily Reproduce our Results
Why do all the grind work, if we already did it for you? leverage tested and proven recipes & code examples for a wide range of computer vision models generated by our team of deep learning experts. Easily configure your own or use plug & play hyperparameters for training, dataset, and architecture.
Production Readiness and Ease of Integration
All SuperGradients models’ are production ready in the sense that they are compatible with deployment tools such as TensorRT (Nvidia) and OpenVino (Intel) and can be easily taken into production. With a few lines of code you can easily integrate the models into your codebase.

Documentation
Check SuperGradients Docs for full documentation, user guide, and examples.
Table of Content
See Table
Getting Started
Quick Start Notebook - Classification
Get started with our quick start notebook for image classification tasks on Google Colab for a quick and easy start using free GPU hardware.
![]() |
![]() |
![]() |
Quick Start Notebook - Semantic Segmentation
Get started with our quick start notebook for semantic segmentation tasks on Google Colab for a quick and easy start using free GPU hardware.
![]() |
![]() |
![]() |
Transfer Learning
Transfer Learning with SG Notebook - Semantic Segmentation
Learn more about SuperGradients transfer learning or fine tuning abilities with our Citiscapes pre-trained RegSeg48 fine tuning into a sub-dataset of Supervisely example notebook on Google Colab for an easy to use tutorial using free GPU hardware
![]() |
![]() |
![]() |
Knowledge Distillation Training
Knowledge Distillation Training Quick Start with SG Notebook - ResNet18 example
Knowledge Distillation is a training technique that uses a large model, teacher model, to improve the performance of a smaller model, the student model. Learn more about SuperGradients knowledge distillation training with our pre-trained BEiT base teacher model and Resnet18 student model on CIFAR10 example notebook on Google Colab for an easy to use tutorial using free GPU hardware
![]() |
![]() |
![]() |
Installation Methods
Prerequisites
General requirements
Python 3.7, 3.8 or 3.9 installed.
torch>=1.9.0
https://pytorch.org/get-started/locally/
The python packages that are specified in requirements.txt;
To train on nvidia GPUs
CuDNN >= 8.1.x
Nvidia Driver with CUDA >= 11.2 support (≥460.x)
Quick Installation
Install using GitHub
pip install git+https://github.com/Deci-AI/super-gradients.git@stable
Computer Vision Models - Pretrained Checkpoints
Pretrained Classification PyTorch Checkpoints
Model |
Dataset |
Resolution |
Top-1 |
Top-5 |
Latency (HW)*T4 |
Latency (Production)**T4 |
Latency (HW)*Jetson Xavier NX |
Latency (Production)**Jetson Xavier NX |
Latency Cascade Lake |
---|---|---|---|---|---|---|---|---|---|
ViT base |
ImageNet21K |
224x224 |
84.15 |
- |
4.46ms |
4.60ms |
- * |
- |
57.22ms |
ViT large |
ImageNet21K |
224x224 |
85.64 |
- |
12.81ms |
13.19ms |
- * |
- |
187.22ms |
BEiT |
ImageNet21K |
224x224 |
- |
- |
-ms |
-ms |
- * |
- |
-ms |
EfficientNet B0 |
ImageNet |
224x224 |
77.62 |
93.49 |
0.93ms |
1.38ms |
- * |
- |
3.44ms |
RegNet Y200 |
ImageNet |
224x224 |
70.88 |
89.35 |
0.63ms |
1.08ms |
2.16ms |
2.47ms |
2.06ms |
RegNet Y400 |
ImageNet |
224x224 |
74.74 |
91.46 |
0.80ms |
1.25ms |
2.62ms |
2.91ms |
2.87ms |
RegNet Y600 |
ImageNet |
224x224 |
76.18 |
92.34 |
0.77ms |
1.22ms |
2.64ms |
2.93ms |
2.39ms |
RegNet Y800 |
ImageNet |
224x224 |
77.07 |
93.26 |
0.74ms |
1.19ms |
2.77ms |
3.04ms |
2.81ms |
ResNet 18 |
ImageNet |
224x224 |
70.6 |
89.64 |
0.52ms |
0.95ms |
2.01ms |
2.30ms |
4.56ms |
ResNet 34 |
ImageNet |
224x224 |
74.13 |
91.7 |
0.92ms |
1.34ms |
3.57ms |
3.87ms |
7.64ms |
ResNet 50 |
ImageNet |
224x224 |
81.91 |
93.0 |
1.03ms |
1.44ms |
4.78ms |
5.10ms |
9.25ms |
MobileNet V3_large-150 epochs |
ImageNet |
224x224 |
73.79 |
91.54 |
0.67ms |
1.11ms |
2.42ms |
2.71ms |
1.76ms |
MobileNet V3_large-300 epochs |
ImageNet |
224x224 |
74.52 |
91.92 |
0.67ms |
1.11ms |
2.42ms |
2.71ms |
1.76ms |
MobileNet V3_small |
ImageNet |
224x224 |
67.45 |
87.47 |
0.55ms |
0.96ms |
2.01ms * |
2.35ms |
1.06ms |
MobileNet V2_w1 |
ImageNet |
224x224 |
73.08 |
91.1 |
0.46 ms |
0.89ms |
1.65ms * |
1.90ms |
1.56ms |
NOTE:
Latency (HW)* - Hardware performance (not including IO)
Latency (Production)** - Production Performance (including IO)
Performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
Performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1
Pretrained Object Detection PyTorch Checkpoints
Model |
Dataset |
Resolution |
mAPval |
Latency (HW)*T4 |
Latency (Production)**T4 |
Latency (HW)*Jetson Xavier NX |
Latency (Production)**Jetson Xavier NX |
Latency Cascade Lake |
---|---|---|---|---|---|---|---|---|
SSD lite MobileNet v2 |
COCO |
320x320 |
21.5 |
0.77ms |
1.40ms |
5.28ms |
6.44ms |
4.13ms |
SSD lite MobileNet v1 |
COCO |
320x320 |
24.3 |
1.55ms |
2.84ms |
8.07ms |
9.14ms |
22.76ms |
YOLOX nano |
COCO |
640x640 |
26.77 |
2.47ms |
4.09ms |
11.49ms |
12.97ms |
- |
YOLOX tiny |
COCO |
640x640 |
37.18 |
3.16ms |
4.61ms |
15.23ms |
19.24ms |
- |
YOLOX small |
COCO |
640x640 |
40.47 |
3.58ms |
4.94ms |
18.88ms |
22.48ms |
- |
YOLOX medium |
COCO |
640x640 |
46.4 |
6.40ms |
7.65ms |
39.22ms |
44.5ms |
- |
YOLOX large |
COCO |
640x640 |
49.25 |
10.07ms |
11.12ms |
68.73ms |
77.01ms |
- |
NOTE:
Latency (HW)* - Hardware performance (not including IO)
Latency (Production)** - Production Performance (including IO)
Latency performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
Latency performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1
Pretrained Semantic Segmentation PyTorch Checkpoints
Model |
Dataset |
Resolution |
mIoU |
Latency b1T4 |
Latency b1T4 including IO |
---|---|---|---|---|---|
DDRNet 23 |
Cityscapes |
1024x2048 |
80.26 |
7.62ms |
25.94ms |
DDRNet 23 slim |
Cityscapes |
1024x2048 |
78.01 |
3.56ms |
22.80ms |
STDC 1-Seg50 |
Cityscapes |
512x1024 |
75.07 |
2.83ms |
12.57ms |
STDC 1-Seg75 |
Cityscapes |
768x1536 |
77.8 |
5.71ms |
26.70ms |
STDC 2-Seg50 |
Cityscapes |
512x1024 |
75.79 |
3.74ms |
13.89ms |
STDC 2-Seg75 |
Cityscapes |
768x1536 |
78.93 |
7.35ms |
28.18ms |
RegSeg (exp48) |
Cityscapes |
1024x2048 |
78.15 |
13.09ms |
41.88ms |
Larger RegSeg (exp53) |
Cityscapes |
1024x2048 |
79.2 |
24.82ms |
51.87ms |
ShelfNet LW 34 |
COCO Segmentation (21 classes from PASCAL including background) |
512x512 |
65.1 |
- |
- |
NOTE: Performance measured on T4 GPU with TensorRT, using FP16 precision and batch size 1 (latency), and not including IO
Implemented Model Architectures
Image Classification
DensNet (Densely Connected Convolutional Networks) - Densely Connected Convolutional Networks https://arxiv.org/pdf/1608.06993.pdf
DPN - Dual Path Networks https://arxiv.org/pdf/1707.01629
MobileNet - Efficient Convolutional Neural Networks for Mobile Vision Applications https://arxiv.org/pdf/1704.04861
PNASNet - Progressive Neural Architecture Search Networks https://arxiv.org/pdf/1712.00559
RepVGG - Making VGG-style ConvNets Great Again https://arxiv.org/pdf/2101.03697.pdf
ResNet - Deep Residual Learning for Image Recognition https://arxiv.org/pdf/1512.03385
ResNeXt - Aggregated Residual Transformations for Deep Neural Networks https://arxiv.org/pdf/1611.05431
SENet - Squeeze-and-Excitation Networkshttps://arxiv.org/pdf/1709.01507
ShuffleNet v2 - Efficient Convolutional Neural Network for Mobile Deviceshttps://arxiv.org/pdf/1807.11164
VGG - Very Deep Convolutional Networks for Large-scale Image Recognition https://arxiv.org/pdf/1409.1556
Object Detection
Semantic Segmentation
DDRNet (Deep Dual-resolution Networks) - https://arxiv.org/pdf/2101.06085.pdf
LadderNet - Multi-path networks based on U-Net for medical image segmentation https://arxiv.org/pdf/1810.07810
RegSeg - Rethink Dilated Convolution for Real-time Semantic Segmentation https://arxiv.org/pdf/2111.09957
STDC - Rethinking BiSeNet For Real-time Semantic Segmentation https://arxiv.org/pdf/2104.13188
Contributing
To learn about making a contribution to SuperGradients, please see our Contribution page.
Our awesome contributors:
Made with contrib.rocks.
Citation
If you are using SuperGradients library or benchmarks in your research, please cite SuperGradients deep learning training library.
Community
If you want to be a part of SuperGradients growing community, hear about all the exciting news and updates, need help, request for advanced features, or want to file a bug or issue report, we would love to welcome you aboard!
Slack is the place to be and ask questions about SuperGradients and get support. Click here to join our Slack
To report a bug, file an issue on GitHub.
You can also join the community mailing list to ask questions about the project and receive announcements.
For a short meeting with SuperGradients PM, use this link and choose your preferred time.
License
This project is released under the Apache 2.0 license.
Deci Lab
Deci Lab supports all common frameworks and Hardware, from Intel CPUs to Nvidia’s GPUs and Jetsons
You can enjoy immediate improvement in throughput, latency, and memory with the Deci Lab. It optimizes deep learning models using best-of-breed technologies, such as quantization and graph compilers.
Get a complete benchmark of your models’ performance on different hardware and batch sizes in a single interface. Invite co-workers to collaborate on models and communicate your progress.
Sign up for Deci Lab for free here