Training package
super_gradients.training module
- class super_gradients.training.DetectionDataSet(root: str, list_file: str, img_size: int = 416, batch_size: int = 16, augment: bool = False, dataset_hyper_params: Optional[dict] = None, cache_labels: bool = False, cache_images: bool = False, sample_loading_method: str = 'default', collate_fn: Optional[Callable] = None, target_extension: str = '.txt', labels_offset: int = 0, class_inclusion_list=None, all_classes_list=None)[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]- static sample_post_process(image)[source]
- sample_post_process - Normalizes and orders the image to be 3 x img_size x img_size
- param image
- return
- static sample_loader(sample_path: str)[source]
- sample_loader - Loads a coco dataset image from path
- param sample_path
- return
- static target_loader(target_path: str, class_inclusion_list=None, all_classes_list=None)[source]
- coco_target_loader
@param target_path: str, path to target. @param all_classes_list: list(str) containing all the class names or None when subclassing is disabled. @param class_inclusion_list: list(str) containing the subclass names or None when subclassing is disabled.
- static target_transform(target, ratio, w, h, pad=None)[source]
- Parameters
target –
ratio –
w –
h –
pad –
- Returns
- static augment_hsv(img, hgain=0.5, sgain=0.5, vgain=0.5)[source]
- Parameters
img –
- param hgain
- param sgain
- param vgain
- Returns
- static letterbox(img, new_shape=(416, 416), color=(128, 128, 128), auto=True, scaleFill=False, scaleup=True, interp=3) → tuple[source]
letterbox - Resizes image to a 32-pixel-multiple rectangle :param img: :param new_shape: :param color: :param auto: :param scaleFill: :param scaleup: :param interp: :return:
- random_perspective(img, targets=(), degrees=10, translate=0.1, scale=0.1, shear=10, border=0, perspective=0)[source]
random images and labels using a perspective transform
- class super_gradients.training.TestDatasetInterface(trainset, dataset_params={}, classes=None)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.DatasetInterface
- get_data_loaders(batch_size_factor=1, num_workers=8, train_batch_size=None, val_batch_size=None, distributed_sampler=False)[source]
Get self.train_loader, self.test_loader, self.classes.
If the data loaders haven’t been initialized yet, build them first.
- Parameters
kwargs – kwargs are passed to build_data_loaders.
- class super_gradients.training.SgModel(experiment_name: str, device: Optional[str] = None, multi_gpu: Union[super_gradients.training.sg_model.sg_model.MultiGPUMode, str] = <MultiGPUMode.OFF: 'Off'>, model_checkpoints_location: str = 'local', overwrite_local_checkpoint: bool = True, ckpt_name: str = 'ckpt_latest.pth', post_prediction_callback: Optional[super_gradients.training.utils.detection_utils.DetectionPostPredictionCallback] = None, ckpt_root_dir=None)[source]
Bases:
object
SuperGradient Model - Base Class for Sg Models
- train(max_epochs: int, initial_epoch: int, save_model: bool)[source]
the main function used for the training, h.p. updating, logging etc.
- test(epoch : int, idx : int, save : bool):
returns the test loss, accuracy and runtime
- connect_dataset_interface(dataset_interface: super_gradients.training.datasets.dataset_interfaces.dataset_interface.DatasetInterface, data_loader_num_workers: int = 8)[source]
- Parameters
dataset_interface – DatasetInterface object
data_loader_num_workers – The number of threads to initialize the Data Loaders with The dataset to be connected
- build_model(architecture: Union[str, torch.nn.modules.module.Module], arch_params={}, checkpoint_params={}, *args, **kwargs)[source]
- Parameters
architecture – Defines the network’s architecture from models/ALL_ARCHITECTURES
arch_params – Architecture H.P. e.g.: block, num_blocks, num_classes, etc.
checkpoint_params –
Dictionary like object with the following key:values:
load_checkpoint: Load a pre-trained checkpoint strict_load: See StrictLoad class documentation for details. source_ckpt_folder_name: folder name to load the checkpoint from (self.experiment_name if none is given) load_weights_only: loads only the weight from the checkpoint and zeroize the training params load_backbone: loads the provided checkpoint to self.net.backbone instead of self.net external_checkpoint_path: The path to the external checkpoint to be loaded. Can be absolute or relative
(ie: path/to/checkpoint.pth). If provided, will automatically attempt to load the checkpoint even if the load_checkpoint flag is not provided.
- backward_step(loss: torch.Tensor, epoch: int, batch_idx: int, context: super_gradients.training.utils.callbacks.PhaseContext, *args, **kwargs)[source]
Run backprop on the loss and perform a step :param loss: The value computed by the loss function :param optimizer: An object that can perform a gradient step and zeroize model gradient :param epoch: number of epoch the training is on :param batch_idx: number of iteration inside the current epoch :param context: current phase context :return:
- save_checkpoint(optimizer=None, epoch: Optional[int] = None, validation_results_tuple: Optional[tuple] = None, context: Optional[super_gradients.training.utils.callbacks.PhaseContext] = None)[source]
Save the current state dict as latest (always), best (if metric was improved), epoch# (if determined in training params)
- train(training_params: dict = {})[source]
train - Trains the Model
- IMPORTANT NOTE: Additional batch parameters can be added as a third item (optional) if a tuple is returned by
the data loaders, as dictionary. The phase context will hold the additional items, under an attribute with the same name as the key in this dictionary. Then such items can be accessed through phase callbacks.
- param training_params
max_epochs : int
Number of epochs to run training.
lr_updates : list(int)
List of fixed epoch numbers to perform learning rate updates when lr_mode=’step’.
lr_decay_factor : float
Decay factor to apply to the learning rate at each update when lr_mode=’step’.
lr_mode : str
Learning rate scheduling policy, one of [‘step’,’poly’,’cosine’,’function’]. ‘step’ refers to constant updates at epoch numbers passed through lr_updates. ‘cosine’ refers to Cosine Anealing policy as mentioned in https://arxiv.org/abs/1608.03983. ‘poly’ refers to polynomial decrease i.e in each epoch iteration self.lr = self.initial_lr * pow((1.0 - (current_iter / max_iter)), 0.9) ‘function’ refers to user defined learning rate scheduling function, that is passed through lr_schedule_function.
lr_schedule_function : Union[callable,None]
Learning rate scheduling function to be used when lr_mode is ‘function’.
lr_warmup_epochs : int (default=0)
Number of epochs for learning rate warm up - see https://arxiv.org/pdf/1706.02677.pdf (Section 2.2).
- cosine_final_lr_ratiofloat (default=0.01)
- Final learning rate ratio (only relevant when `lr_mode`=’cosine’). The cosine starts from initial_lr and reaches
initial_lr * cosine_final_lr_ratio in last epoch
inital_lr : float
Initial learning rate.
loss : Union[nn.module, str]
Loss function for training. One of SuperGradient’s built in options:
“cross_entropy”: LabelSmoothingCrossEntropyLoss, “mse”: MSELoss, “r_squared_loss”: RSquaredLoss, “detection_loss”: YoLoV3DetectionLoss, “shelfnet_ohem_loss”: ShelfNetOHEMLoss, “shelfnet_se_loss”: ShelfNetSemanticEncodingLoss, “yolo_v5_loss”: YoLoV5DetectionLoss, “ssd_loss”: SSDLoss,
or user defined nn.module loss function.
IMPORTANT: forward(…) should return a (loss, loss_items) tuple where loss is the tensor used for backprop (i.e what your original loss function returns), and loss_items should be a tensor of shape (n_items), of values computed during the forward pass which we desire to log over the entire epoch. For example- the loss itself should always be logged. Another example is a scenario where the computed loss is the sum of a few components we would like to log- these entries in loss_items).
When training, set the loss_logging_items_names parameter in train_params to be a list of strings, of length n_items who’s ith element is the name of the ith entry in loss_items. Then each item will be logged, rendered on tensorboard and “watched” (i.e saving model checkpoints according to it).
Since running logs will save the loss_items in some internal state, it is recommended that loss_items are detached from their computational graph for memory efficiency.
optimizer : Union[str, torch.optim.Optimizer]
Optimization algorithm. One of [‘Adam’,’SGD’,’RMSProp’] corresponding to the torch.optim optimzers implementations, or any object that implements torch.optim.Optimizer.
criterion_params : dict
Loss function parameters.
- optimizer_paramsdict
When optimizer is one of [‘Adam’,’SGD’,’RMSProp’], it will be initialized with optimizer_params.
(see https://pytorch.org/docs/stable/optim.html for the full list of parameters for each optimizer).
train_metrics_list : list(torchmetrics.Metric)
Metrics to log during training. For more information on torchmetrics see https://torchmetrics.rtfd.io/en/latest/.
valid_metrics_list : list(torchmetrics.Metric)
Metrics to log during validation/testing. For more information on torchmetrics see https://torchmetrics.rtfd.io/en/latest/.
loss_logging_items_names : list(str)
The list of names/titles for the outputs returned from the loss functions forward pass (reminder- the loss function should return the tuple (loss, loss_items)). These names will be used for logging their values.
metric_to_watch : str (default=”Accuracy”)
will be the metric which the model checkpoint will be saved according to, and can be set to any of the following:
a metric name (str) of one of the metric objects from the valid_metrics_list
a “metric_name” if some metric in valid_metrics_list has an attribute component_names which is a list referring to the names of each entry in the output metric (torch tensor of size n)
one of “loss_logging_items_names” i.e which will correspond to an item returned during the loss function’s forward pass.
At the end of each epoch, if a new best metric_to_watch value is achieved, the models checkpoint is saved in YOUR_PYTHON_PATH/checkpoints/ckpt_best.pth
greater_metric_to_watch_is_better : bool
- When choosing a model’s checkpoint to be saved, the best achieved model is the one that maximizes the
metric_to_watch when this parameter is set to True, and a one that minimizes it otherwise.
ema : bool (default=False)
Whether to use Model Exponential Moving Average (see https://github.com/rwightman/pytorch-image-models ema implementation)
batch_accumulate : int (default=1)
Number of batches to accumulate before every backward pass.
ema_params : dict
Parameters for the ema model.
zero_weight_decay_on_bias_and_bn : bool (default=False)
Whether to apply weight decay on batch normalization parameters or not (ignored when the passed optimizer has already been initialized).
load_opt_params : bool (default=True)
Whether to load the optimizers parameters as well when loading a model’s checkpoint.
run_validation_freq : int (default=1)
- The frequency in which validation is performed during training (i.e the validation is ran every
run_validation_freq epochs.
save_model : bool (default=True)
Whether to save the model checkpoints.
silent_mode : bool
Silents the print outs.
mixed_precision : bool
Whether to use mixed precision or not.
save_ckpt_epoch_list : list(int) (default=[])
List of fixed epoch indices the user wishes to save checkpoints in.
average_best_models : bool (default=False)
If set, a snapshot dictionary file and the average model will be saved / updated at every epoch and evaluated only when training is completed. The snapshot file will only be deleted upon completing the training. The snapshot dict will be managed on cpu.
precise_bn : bool (default=False)
Whether to use precise_bn calculation during the training.
precise_bn_batch_size : int (default=None)
The effective batch size we want to calculate the batchnorm on. For example, if we are training a model on 8 gpus, with a batch of 128 on each gpu, a good rule of thumb would be to give it 8192 (ie: effective_batch_size * num_gpus = batch_per_gpu * num_gpus * num_gpus). If precise_bn_batch_size is not provided in the training_params, the latter heuristic will be taken.
seed : int (default=42)
Random seed to be set for torch, numpy, and random. When using DDP each process will have it’s seed set to seed + rank.
log_installed_packages : bool (default=False)
- When set, the list of all installed packages (and their versions) will be written to the tensorboard
and logfile (useful when trying to reproduce results).
dataset_statistics : bool (default=False)
Enable a statistic analysis of the dataset. If set to True the dataset will be analyzed and a report will be added to the tensorboard along with some sample images from the dataset. Currently only detection datasets are supported for analysis.
save_full_train_log : bool (default=False)
- When set, a full log (of all super_gradients modules, including uncaught exceptions from any other
module) of the training will be saved in the checkpoint directory under full_train_log.log
sg_logger : Union[AbstractSGLogger, str] (defauls=base_sg_logger)
Define the SGLogger object for this training process. The SGLogger handles all disk writes, logs, TensorBoard, remote logging and remote storage. By overriding the default base_sg_logger, you can change the storage location, support external monitoring and logging or support remote storage.
sg_logger_params : dict
SGLogger parameters
clip_grad_norm : float
Defines a maximal L2 norm of the gradients. Values which exceed the given value will be clipped
- Returns
- predict(inputs, targets=None, half=False, normalize=False, verbose=False, move_outputs_to_cpu=True)[source]
A fast predictor for a batch of inputs :param inputs: torch.tensor or numpy.array
a batch of inputs
- Parameters
targets – torch.tensor() corresponding labels - if non are given - accuracy will not be computed
verbose – bool print the results to screen
normalize – bool If true, normalizes the tensor according to the dataloader’s normalization values
half – Performs half precision evaluation
move_outputs_to_cpu – Moves the results from the GPU to the CPU
- Returns
outputs, acc, net_time, gross_time networks predictions, accuracy calculation, forward pass net time, function gross time
- compute_model_runtime(input_dims: Optional[tuple] = None, batch_sizes: Union[tuple, list, int] = (1, 8, 16, 32, 64), verbose: bool = True)[source]
Compute the “atomic” inference time and throughput. Atomic refers to calculating the forward pass independently, discarding effects such as data augmentation, data upload to device, multi-gpu distribution etc. :param input_dims: tuple
shape of a basic input to the network (without the first index) e.g. (3, 224, 224) if None uses an input from the test loader
- Parameters
batch_sizes – int or list Batch sizes for latency calculation
verbose – bool Prints results to screen
- Returns
log: dict Latency and throughput for each tested batch size
- re_build_model(arch_params={})[source]
- arch_paramsdict
Architecture H.P. e.g.: block, num_blocks, num_classes, etc.
- Returns
- update_architecture(structure)[source]
- architecturestr
Defines the network’s architecture according to the options in models/all_architectures
- load_checkpointbool
Loads a checkpoint according to experiment_name
- arch_paramsdict
Architecture H.P. e.g.: block, num_blocks, num_classes, etc.
- Returns
- test(test_loader: Optional[torch.utils.data.dataloader.DataLoader] = None, loss: Optional[torch.nn.modules.loss._Loss] = None, silent_mode: bool = False, test_metrics_list=None, loss_logging_items_names=None, metrics_progress_verbose=False, test_phase_callbacks=None, use_ema_net=True) → tuple[source]
Evaluates the model on given dataloader and metrics.
- Parameters
test_loader – dataloader to perform test on.
test_metrics_list – (list(torchmetrics.Metric)) metrics list for evaluation.
silent_mode – (bool) controls verbosity
metrics_progress_verbose – (bool) controls the verbosity of metrics progress (default=False). Slows down the program.
- :param use_ema_net (bool) whether to perform test on self.ema_model.ema (when self.ema_model.ema exists,
otherwise self.net will be tested) (default=True)
- Returns
results tuple (tuple) containing the loss items and metric values.
- All of the above args will override SgModel’s corresponding attribute when not equal to None. Then evaluation
is ran on self.test_loader with self.test_metrics.
- evaluate(data_loader: torch.utils.data.dataloader.DataLoader, metrics: torchmetrics.collections.MetricCollection, evaluation_type: super_gradients.training.sg_model.sg_model.EvaluationType, epoch: Optional[int] = None, silent_mode: bool = False, metrics_progress_verbose: bool = False)[source]
Evaluates the model on given dataloader and metrics.
- Parameters
data_loader – dataloader to perform evaluataion on
metrics – (MetricCollection) metrics for evaluation
evaluation_type – (EvaluationType) controls which phase callbacks will be used (for example, on batch end, when evaluation_type=EvaluationType.VALIDATION the Phase.VALIDATION_BATCH_END callbacks will be triggered)
epoch – (int) epoch idx
silent_mode – (bool) controls verbosity
metrics_progress_verbose – (bool) controls the verbosity of metrics progress (default=False). Slows down the program significantly.
- Returns
results tuple (tuple) containing the loss items and metric values.
- instantiate_net(architecture: Union[torch.nn.modules.module.Module, type, str], arch_params: dict, checkpoint_params: dict, *args, **kwargs) → tuple[source]
- Instantiates nn.Module according to architecture and arch_params, and handles pretrained weights and the required
module manipulation (i.e head replacement).
- Parameters
architecture – String, torch.nn.Module or uninstantiated SgModule class describing the netowrks architecture.
arch_params – Architecture’s parameters passed to networks c’tor.
checkpoint_params – checkpoint loading related parameters dictionary with ‘pretrained_weights’ key, s.t it’s value is a string describing the dataset of the pretrained weights (for example “imagenent”).
- Returns
instantiated netowrk i.e torch.nn.Module, architecture_class (will be none when architecture is not str)
- class super_gradients.training.MultiGPUMode(value)[source]
Bases:
str
,enum.Enum
- OFF - Single GPU Mode / CPU Mode
- DATA_PARALLEL - Multiple GPUs, Synchronous
- DISTRIBUTED_DATA_PARALLEL - Multiple GPUs, Asynchronous
- OFF = 'Off'
- DATA_PARALLEL = 'DP'
- DISTRIBUTED_DATA_PARALLEL = 'DDP'
- AUTO = 'AUTO'
- class super_gradients.training.SegmentationTestDatasetInterface(dataset_params={}, image_size=512, batch_size=4)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.TestDatasetInterface
- class super_gradients.training.DetectionTestDatasetInterface(dataset_params={}, image_size=320, batch_size=4)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.TestDatasetInterface
- class super_gradients.training.ClassificationTestDatasetInterface(dataset_params={}, image_size=32, batch_size=5, classes=None)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.TestDatasetInterface
- class super_gradients.training.StrictLoad(value)[source]
Bases:
enum.Enum
- Wrapper for adding more functionality to torch’s strict_load parameter in load_state_dict().
- Attributes:
OFF - Native torch “strict_load = off” behaviour. See nn.Module.load_state_dict() documentation for more details. ON - Native torch “strict_load = on” behaviour. See nn.Module.load_state_dict() documentation for more details. NO_KEY_MATCHING - Allows the usage of SuperGradient’s adapt_checkpoint function, which loads a checkpoint by matching each
layer’s shapes (and bypasses the strict matching of the names of each layer (ie: disregards the state_dict key matching)).
- OFF = False
- ON = True
- NO_KEY_MATCHING = 'no_key_matching'
super_gradients.training.datasets module
- class super_gradients.training.datasets.ListDataset(root, file, sample_loader: Callable = <function default_loader>, target_loader: Optional[Callable] = None, collate_fn: Optional[Callable] = None, sample_extensions: tuple = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'), sample_transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, target_extension='.npy')[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]- ListDataset - A PyTorch Vision Data Set extension that receives a file with FULL PATH to each of the samples.
Then, the assumption is that for every sample, there is a * matching target * in the same path but with a different extension, i.e:
- for the samples paths: (That appear in the list file)
/root/dataset/class_x/sample1.png /root/dataset/class_y/sample123.png
- the matching labels paths: (That DO NOT appear in the list file)
/root/dataset/class_x/sample1.ext /root/dataset/class_y/sample123.ext
- class super_gradients.training.datasets.DirectoryDataSet(root: str, samples_sub_directory: str, targets_sub_directory: str, target_extension: str, sample_loader: Callable = <function default_loader>, target_loader: Optional[Callable] = None, collate_fn: Optional[Callable] = None, sample_extensions: tuple = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif', '.tiff', '.webp'), sample_transform: Optional[Callable] = None, target_transform: Optional[Callable] = None)[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]- DirectoryDataSet - A PyTorch Vision Data Set extension that receives a root Dir and two separate sub directories:
Sub-Directory for Samples
Sub-Directory for Targets
- class super_gradients.training.datasets.DetectionDataSet(root: str, list_file: str, img_size: int = 416, batch_size: int = 16, augment: bool = False, dataset_hyper_params: Optional[dict] = None, cache_labels: bool = False, cache_images: bool = False, sample_loading_method: str = 'default', collate_fn: Optional[Callable] = None, target_extension: str = '.txt', labels_offset: int = 0, class_inclusion_list=None, all_classes_list=None)[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]- static sample_post_process(image)[source]
- sample_post_process - Normalizes and orders the image to be 3 x img_size x img_size
- param image
- return
- static sample_loader(sample_path: str)[source]
- sample_loader - Loads a coco dataset image from path
- param sample_path
- return
- static target_loader(target_path: str, class_inclusion_list=None, all_classes_list=None)[source]
- coco_target_loader
@param target_path: str, path to target. @param all_classes_list: list(str) containing all the class names or None when subclassing is disabled. @param class_inclusion_list: list(str) containing the subclass names or None when subclassing is disabled.
- static target_transform(target, ratio, w, h, pad=None)[source]
- Parameters
target –
ratio –
w –
h –
pad –
- Returns
- static augment_hsv(img, hgain=0.5, sgain=0.5, vgain=0.5)[source]
- Parameters
img –
- param hgain
- param sgain
- param vgain
- Returns
- static letterbox(img, new_shape=(416, 416), color=(128, 128, 128), auto=True, scaleFill=False, scaleup=True, interp=3) → tuple[source]
letterbox - Resizes image to a 32-pixel-multiple rectangle :param img: :param new_shape: :param color: :param auto: :param scaleFill: :param scaleup: :param interp: :return:
- random_perspective(img, targets=(), degrees=10, translate=0.1, scale=0.1, shear=10, border=0, perspective=0)[source]
random images and labels using a perspective transform
- class super_gradients.training.datasets.COCODetectionDataSet(*args, **kwargs)[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]COCODetectionDataSet - Detection Data Set Class COCO Data Set
- class super_gradients.training.datasets.SegmentationDataSet(root: str, list_file: str = None, samples_sub_directory: str = None, targets_sub_directory: str = None, img_size: int = 608, crop_size: int = 512, batch_size: int = 16, augment: bool = False, dataset_hyper_params: dict = None, cache_labels: bool = False, cache_images: bool = False, sample_loader: Callable = None, target_loader: Callable = None, collate_fn: Callable = None, target_extension: str = '.png', image_mask_transforms: torchvision.transforms.transforms.Compose = None, image_mask_transforms_aug: torchvision.transforms.transforms.Compose = None)[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]- static sample_loader(sample_path: str) → <module ‘PIL.Image’ from ‘/Users/oferbaratz/PycharmProjects/SG/venv/lib/python3.8/site-packages/PIL/Image.py’>[source]
- sample_loader - Loads a dataset image from path using PIL
- param sample_path
The path to the sample image
- return
The loaded Image
- static sample_transform(image)[source]
sample_transform - Transforms the sample image
- param image
The input image to transform
- return
The transformed image
- class super_gradients.training.datasets.PascalVOC2012SegmentationDataSet(sample_suffix=None, target_suffix=None, *args, **kwargs)[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]PascalVOC2012SegmentationDataSet - Segmentation Data Set Class for Pascal VOC 2012 Data Set
- class super_gradients.training.datasets.PascalAUG2012SegmentationDataSet(*args, **kwargs)[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]PascalAUG2012SegmentationDataSet - Segmentation Data Set Class for Pascal AUG 2012 Data Set
- class super_gradients.training.datasets.CoCoSegmentationDataSet(dataset_classes_inclusion_tuples_list: Optional[list] = None, *args, **kwargs)[source]
Bases:
Generic
[torch.utils.data.dataset.T_co
]CoCoSegmentationDataSet - Segmentation Data Set Class for COCO 2017 Segmentation Data Set
- target_loader(mask_metadata_tuple) → <module ‘PIL.Image’ from ‘/Users/oferbaratz/PycharmProjects/SG/venv/lib/python3.8/site-packages/PIL/Image.py’>[source]
- Parameters
mask_metadata_tuple – A tuple of (coco_image_id, original_image_height, original_image_width)
- Returns
The mask image created from the array
- class super_gradients.training.datasets.TestDatasetInterface(trainset, dataset_params={}, classes=None)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.DatasetInterface
- get_data_loaders(batch_size_factor=1, num_workers=8, train_batch_size=None, val_batch_size=None, distributed_sampler=False)[source]
Get self.train_loader, self.test_loader, self.classes.
If the data loaders haven’t been initialized yet, build them first.
- Parameters
kwargs – kwargs are passed to build_data_loaders.
- class super_gradients.training.datasets.DatasetInterface(dataset_params={}, train_loader=None, val_loader=None, test_loader=None, classes=None)[source]
Bases:
object
DatasetInterface - This class manages all of the “communiation” the Model has with the Data Sets
- build_data_loaders(batch_size_factor=1, num_workers=8, train_batch_size=None, val_batch_size=None, test_batch_size=None, distributed_sampler: bool = False)[source]
define train, val (and optionally test) loaders. The method deals separately with distributed training and standard (non distributed, or parallel training). In the case of distributed training we need to rely on distributed samplers. :param batch_size_factor: int - factor to multiply the batch size (usually for multi gpu) :param num_workers: int - number of workers (parallel processes) for dataloaders :param train_batch_size: int - batch size for train loader, if None will be taken from dataset_params :param val_batch_size: int - batch size for val loader, if None will be taken from dataset_params :param distributed_sampler: boolean flag for distributed training mode :return: train_loader, val_loader, classes: list of classes
- class super_gradients.training.datasets.Cifar10DatasetInterface(dataset_params={})[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.LibraryDatasetInterface
- class super_gradients.training.datasets.CoCoSegmentationDatasetInterface(dataset_params=None, cache_labels: bool = False, cache_images: bool = False, dataset_classes_inclusion_tuples_list: Optional[list] = None)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.CoCoDataSetInterfaceBase
- class super_gradients.training.datasets.CoCoDetectionDatasetInterface(dataset_params=None, cache_labels=False, cache_images=False, train_list_file='train2017.txt', val_list_file='val2017.txt')[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.CoCoDataSetInterfaceBase
- class super_gradients.training.datasets.CoCo2014DetectionDatasetInterface(dataset_params=None, cache_labels=False, cache_images=False, train_list_file='train2014.txt', val_list_file='val2014.txt')[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.CoCoDetectionDatasetInterface
- class super_gradients.training.datasets.PascalVOC2012SegmentationDataSetInterface(dataset_params=None, cache_labels=False, cache_images=False)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.DatasetInterface
- class super_gradients.training.datasets.PascalAUG2012SegmentationDataSetInterface(dataset_params=None, cache_labels=False, cache_images=False)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.DatasetInterface
- class super_gradients.training.datasets.TestYoloDetectionDatasetInterface(dataset_params={}, input_dims=(3, 32, 32), batch_size=5)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.DatasetInterface
note: the output size is (batch_size, 6) in the test while in real training the size of axis 0 can vary (the number of bounding boxes)
- class super_gradients.training.datasets.DetectionTestDatasetInterface(dataset_params={}, image_size=320, batch_size=4)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.TestDatasetInterface
- class super_gradients.training.datasets.ClassificationTestDatasetInterface(dataset_params={}, image_size=32, batch_size=5, classes=None)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.TestDatasetInterface
- class super_gradients.training.datasets.SegmentationTestDatasetInterface(dataset_params={}, image_size=512, batch_size=4)[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.TestDatasetInterface
- class super_gradients.training.datasets.ImageNetDatasetInterface(dataset_params={}, data_dir='/data/Imagenet')[source]
Bases:
super_gradients.training.datasets.dataset_interfaces.dataset_interface.DatasetInterface
super_gradients.training.exceptions module
super_gradients.training.legacy module
super_gradients.training.losses_models module
- class super_gradients.training.losses.FocalLoss(loss_fcn: torch.nn.modules.loss.BCEWithLogitsLoss, gamma=1.5, alpha=0.25)[source]
Bases:
torch.nn.modules.loss._Loss
Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)
- reduction: str
- forward(pred, true)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class super_gradients.training.losses.LabelSmoothingCrossEntropyLoss(weight=None, ignore_index=- 100, reduction='mean', smooth_eps=None, smooth_dist=None, from_logits=True)[source]
Bases:
torch.nn.modules.loss.CrossEntropyLoss
CrossEntropyLoss - with ability to recieve distrbution as targets, and optional label smoothing
- forward(input, target, smooth_dist=None)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- ignore_index: int
- label_smoothing: float
- class super_gradients.training.losses.ShelfNetOHEMLoss(threshold: float = 0.7, mining_percent: float = 0.0001, ignore_lb: int = 255)[source]
Bases:
super_gradients.training.losses.ohem_ce_loss.OhemCELoss
- forward(predictions_list: list, targets)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- reduction: str
- class super_gradients.training.losses.ShelfNetSemanticEncodingLoss(se_weight=0.2, nclass=21, aux_weight=0.4, weight=None, ignore_index=- 1)[source]
Bases:
torch.nn.modules.loss.CrossEntropyLoss
2D Cross Entropy Loss with Auxilary Loss
- forward(logits, labels)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- ignore_index: int
- label_smoothing: float
- class super_gradients.training.losses.YoLoV3DetectionLoss(model: torch.nn.modules.module.Module, cls_pw: float = 1.0, obj_pw: float = 1.0, giou: float = 3.54, obj: float = 64.3, cls: float = 37.4)[source]
Bases:
torch.nn.modules.loss._Loss
YoLoV3DetectionLoss - Loss Class for Object Detection
- forward(model_output, targets)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- reduction: str
- class super_gradients.training.losses.YoLoV5DetectionLoss(anchors: super_gradients.training.utils.detection_utils.Anchors, cls_pos_weight: Union[float, List[float]] = 1.0, obj_pos_weight: float = 1.0, obj_loss_gain: float = 1.0, box_loss_gain: float = 0.05, cls_loss_gain: float = 0.5, focal_loss_gamma: float = 0.0, cls_objectness_weights: Optional[Union[List[float], torch.Tensor]] = None, anchor_threshold=4.0)[source]
Bases:
torch.nn.modules.loss._Loss
Calculate YOLO V5 loss: L = L_objectivness + L_boxes + L_classification
- forward(model_output, targets)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- build_targets(predictions: List[torch.Tensor], targets: torch.Tensor) → Tuple[List[torch.Tensor], List[torch.Tensor], List[Tuple[torch.Tensor]], List[torch.Tensor]][source]
- Assign targets to anchors to use in L_boxes & L_classification calculation:
each target can be assigned to a few anchors,
all anchors that are within [1/self.anchor_threshold, self.anchor_threshold] times target size range * each anchor can be assigned to a few targets
- Parameters
predictions – Yolo predictions
targets – ground truth targets
- Returns
each of 4 outputs contains one element for each Yolo output, correspondences are raveled over the whole batch and all anchors:
classes of the targets;
boxes of the targets;
image id in a batch, anchor id, grid y, grid x coordinates;
anchor sizes.
All the above can be indexed in parallel to get the selected correspondences
- compute_loss(predictions: List[torch.Tensor], targets: torch.Tensor, giou_loss_ratio: float = 1.0) → Tuple[torch.Tensor, torch.Tensor][source]
L = L_objectivness + L_boxes + L_classification where:
L_boxes and L_classification are calculated only between anchors and targets that suit them;
L_objectivness is calculated on all anchors.
- L_classification:
for anchors that have suitable ground truths in their grid locations add BCEs to force max probability for each GT class in a multi-label way Coef: self.cls_loss_gain
- L_boxes:
for anchors that have suitable ground truths in their grid locations add (1 - IoU), IoU between a predicted box and each GT box, force maximum IoU Coef: self.box_loss_gain
- L_objectness:
for each anchor add BCE to force a prediction of (1 - giou_loss_ratio) + giou_loss_ratio * IoU, IoU between a predicted box and random GT in it Coef: self.obj_loss_gain, loss from each YOLO grid is additionally multiplied by balance = [4.0, 1.0, 0.4]
to balance different contributions coming from different numbers of grid cells
- Parameters
predictions – output from all Yolo levels, each of shape [Batch x Num_Anchors x GridSizeY x GridSizeX x (4 + 1 + Num_classes)]
targets – [Num_targets x (4 + 2)], values on dim 1 are: image id in a batch, class, box x y w h
giou_loss_ratio – a coef in L_objectness defining what should be predicted as objecness in a call with a target: can be a value in [IoU, 1] range
- Returns
loss, all losses separately in a detached tensor
- reduction: str
- class super_gradients.training.losses.RSquaredLoss(size_average=None, reduce=None, reduction: str = 'mean')[source]
Bases:
torch.nn.modules.loss._Loss
- forward(output, target)[source]
Computes the R-squared for the output and target values :param output: Tensor / Numpy / List
The prediction
- Parameters
target – Tensor / Numpy / List The corresponding lables
- reduction: str
- class super_gradients.training.losses.SSDLoss(dboxes: super_gradients.training.utils.ssd_utils.DefaultBoxes, alpha: float = 1.0)[source]
Bases:
torch.nn.modules.loss._Loss
Implements the loss as the sum of the followings: 1. Confidence Loss: All labels, with hard negative mining 2. Localization Loss: Only on positive labels
- match_dboxes(targets)[source]
convert ground truth boxes into a tensor with the same size as dboxes. each gt bbox is matched to every destination box which overlaps it over 0.5 (IoU). so some gt bboxes can be duplicated to a few destination boxes :param targets: a tensor containing the boxes for a single image. shape [num_boxes, 5] (x,y,w,h,label) :return: two tensors
boxes - shape of dboxes [4, num_dboxes] (x,y,w,h) labels - sahpe [num_dboxes]
- forward(predictions, targets)[source]
- Compute the loss
:param predictions - predictions tensor coming from the network. shape [N, num_classes+4, num_dboxes] were the first four items are (x,y,w,h) and the rest are class confidence :param targets - targets for the batch. [num targets, 6] (index in batch, label, x,y,w,h)
- reduction: str
- class super_gradients.training.losses.BCEDiceLoss(loss_weights=[0.5, 0.5], logits=True)[source]
Bases:
torch.nn.modules.module.Module
Binary Cross Entropy + Dice Loss
Weighted average of BCE and Dice loss
- loss_weights
list of size 2 s.t loss_weights[0], loss_weights[1] are the weights for BCE, Dice
- respectively.
- forward(input: torch.Tensor, target: torch.Tensor) → torch.Tensor[source]
@param input: Network’s raw output shaped (N,1,H,W) @param target: Ground truth shaped (N,H,W)
- training: bool
super_gradients.training.metrics module
super_gradients.training.models module
super_gradients.training.sg_model module
- class super_gradients.training.sg_model.SgModel(experiment_name: str, device: Optional[str] = None, multi_gpu: Union[super_gradients.training.sg_model.sg_model.MultiGPUMode, str] = <MultiGPUMode.OFF: 'Off'>, model_checkpoints_location: str = 'local', overwrite_local_checkpoint: bool = True, ckpt_name: str = 'ckpt_latest.pth', post_prediction_callback: Optional[super_gradients.training.utils.detection_utils.DetectionPostPredictionCallback] = None, ckpt_root_dir=None)[source]
Bases:
object
SuperGradient Model - Base Class for Sg Models
- train(max_epochs: int, initial_epoch: int, save_model: bool)[source]
the main function used for the training, h.p. updating, logging etc.
- test(epoch : int, idx : int, save : bool):
returns the test loss, accuracy and runtime
- connect_dataset_interface(dataset_interface: super_gradients.training.datasets.dataset_interfaces.dataset_interface.DatasetInterface, data_loader_num_workers: int = 8)[source]
- Parameters
dataset_interface – DatasetInterface object
data_loader_num_workers – The number of threads to initialize the Data Loaders with The dataset to be connected
- build_model(architecture: Union[str, torch.nn.modules.module.Module], arch_params={}, checkpoint_params={}, *args, **kwargs)[source]
- Parameters
architecture – Defines the network’s architecture from models/ALL_ARCHITECTURES
arch_params – Architecture H.P. e.g.: block, num_blocks, num_classes, etc.
checkpoint_params –
Dictionary like object with the following key:values:
load_checkpoint: Load a pre-trained checkpoint strict_load: See StrictLoad class documentation for details. source_ckpt_folder_name: folder name to load the checkpoint from (self.experiment_name if none is given) load_weights_only: loads only the weight from the checkpoint and zeroize the training params load_backbone: loads the provided checkpoint to self.net.backbone instead of self.net external_checkpoint_path: The path to the external checkpoint to be loaded. Can be absolute or relative
(ie: path/to/checkpoint.pth). If provided, will automatically attempt to load the checkpoint even if the load_checkpoint flag is not provided.
- backward_step(loss: torch.Tensor, epoch: int, batch_idx: int, context: super_gradients.training.utils.callbacks.PhaseContext, *args, **kwargs)[source]
Run backprop on the loss and perform a step :param loss: The value computed by the loss function :param optimizer: An object that can perform a gradient step and zeroize model gradient :param epoch: number of epoch the training is on :param batch_idx: number of iteration inside the current epoch :param context: current phase context :return:
- save_checkpoint(optimizer=None, epoch: Optional[int] = None, validation_results_tuple: Optional[tuple] = None, context: Optional[super_gradients.training.utils.callbacks.PhaseContext] = None)[source]
Save the current state dict as latest (always), best (if metric was improved), epoch# (if determined in training params)
- train(training_params: dict = {})[source]
train - Trains the Model
- IMPORTANT NOTE: Additional batch parameters can be added as a third item (optional) if a tuple is returned by
the data loaders, as dictionary. The phase context will hold the additional items, under an attribute with the same name as the key in this dictionary. Then such items can be accessed through phase callbacks.
- param training_params
max_epochs : int
Number of epochs to run training.
lr_updates : list(int)
List of fixed epoch numbers to perform learning rate updates when lr_mode=’step’.
lr_decay_factor : float
Decay factor to apply to the learning rate at each update when lr_mode=’step’.
lr_mode : str
Learning rate scheduling policy, one of [‘step’,’poly’,’cosine’,’function’]. ‘step’ refers to constant updates at epoch numbers passed through lr_updates. ‘cosine’ refers to Cosine Anealing policy as mentioned in https://arxiv.org/abs/1608.03983. ‘poly’ refers to polynomial decrease i.e in each epoch iteration self.lr = self.initial_lr * pow((1.0 - (current_iter / max_iter)), 0.9) ‘function’ refers to user defined learning rate scheduling function, that is passed through lr_schedule_function.
lr_schedule_function : Union[callable,None]
Learning rate scheduling function to be used when lr_mode is ‘function’.
lr_warmup_epochs : int (default=0)
Number of epochs for learning rate warm up - see https://arxiv.org/pdf/1706.02677.pdf (Section 2.2).
- cosine_final_lr_ratiofloat (default=0.01)
- Final learning rate ratio (only relevant when `lr_mode`=’cosine’). The cosine starts from initial_lr and reaches
initial_lr * cosine_final_lr_ratio in last epoch
inital_lr : float
Initial learning rate.
loss : Union[nn.module, str]
Loss function for training. One of SuperGradient’s built in options:
“cross_entropy”: LabelSmoothingCrossEntropyLoss, “mse”: MSELoss, “r_squared_loss”: RSquaredLoss, “detection_loss”: YoLoV3DetectionLoss, “shelfnet_ohem_loss”: ShelfNetOHEMLoss, “shelfnet_se_loss”: ShelfNetSemanticEncodingLoss, “yolo_v5_loss”: YoLoV5DetectionLoss, “ssd_loss”: SSDLoss,
or user defined nn.module loss function.
IMPORTANT: forward(…) should return a (loss, loss_items) tuple where loss is the tensor used for backprop (i.e what your original loss function returns), and loss_items should be a tensor of shape (n_items), of values computed during the forward pass which we desire to log over the entire epoch. For example- the loss itself should always be logged. Another example is a scenario where the computed loss is the sum of a few components we would like to log- these entries in loss_items).
When training, set the loss_logging_items_names parameter in train_params to be a list of strings, of length n_items who’s ith element is the name of the ith entry in loss_items. Then each item will be logged, rendered on tensorboard and “watched” (i.e saving model checkpoints according to it).
Since running logs will save the loss_items in some internal state, it is recommended that loss_items are detached from their computational graph for memory efficiency.
optimizer : Union[str, torch.optim.Optimizer]
Optimization algorithm. One of [‘Adam’,’SGD’,’RMSProp’] corresponding to the torch.optim optimzers implementations, or any object that implements torch.optim.Optimizer.
criterion_params : dict
Loss function parameters.
- optimizer_paramsdict
When optimizer is one of [‘Adam’,’SGD’,’RMSProp’], it will be initialized with optimizer_params.
(see https://pytorch.org/docs/stable/optim.html for the full list of parameters for each optimizer).
train_metrics_list : list(torchmetrics.Metric)
Metrics to log during training. For more information on torchmetrics see https://torchmetrics.rtfd.io/en/latest/.
valid_metrics_list : list(torchmetrics.Metric)
Metrics to log during validation/testing. For more information on torchmetrics see https://torchmetrics.rtfd.io/en/latest/.
loss_logging_items_names : list(str)
The list of names/titles for the outputs returned from the loss functions forward pass (reminder- the loss function should return the tuple (loss, loss_items)). These names will be used for logging their values.
metric_to_watch : str (default=”Accuracy”)
will be the metric which the model checkpoint will be saved according to, and can be set to any of the following:
a metric name (str) of one of the metric objects from the valid_metrics_list
a “metric_name” if some metric in valid_metrics_list has an attribute component_names which is a list referring to the names of each entry in the output metric (torch tensor of size n)
one of “loss_logging_items_names” i.e which will correspond to an item returned during the loss function’s forward pass.
At the end of each epoch, if a new best metric_to_watch value is achieved, the models checkpoint is saved in YOUR_PYTHON_PATH/checkpoints/ckpt_best.pth
greater_metric_to_watch_is_better : bool
- When choosing a model’s checkpoint to be saved, the best achieved model is the one that maximizes the
metric_to_watch when this parameter is set to True, and a one that minimizes it otherwise.
ema : bool (default=False)
Whether to use Model Exponential Moving Average (see https://github.com/rwightman/pytorch-image-models ema implementation)
batch_accumulate : int (default=1)
Number of batches to accumulate before every backward pass.
ema_params : dict
Parameters for the ema model.
zero_weight_decay_on_bias_and_bn : bool (default=False)
Whether to apply weight decay on batch normalization parameters or not (ignored when the passed optimizer has already been initialized).
load_opt_params : bool (default=True)
Whether to load the optimizers parameters as well when loading a model’s checkpoint.
run_validation_freq : int (default=1)
- The frequency in which validation is performed during training (i.e the validation is ran every
run_validation_freq epochs.
save_model : bool (default=True)
Whether to save the model checkpoints.
silent_mode : bool
Silents the print outs.
mixed_precision : bool
Whether to use mixed precision or not.
save_ckpt_epoch_list : list(int) (default=[])
List of fixed epoch indices the user wishes to save checkpoints in.
average_best_models : bool (default=False)
If set, a snapshot dictionary file and the average model will be saved / updated at every epoch and evaluated only when training is completed. The snapshot file will only be deleted upon completing the training. The snapshot dict will be managed on cpu.
precise_bn : bool (default=False)
Whether to use precise_bn calculation during the training.
precise_bn_batch_size : int (default=None)
The effective batch size we want to calculate the batchnorm on. For example, if we are training a model on 8 gpus, with a batch of 128 on each gpu, a good rule of thumb would be to give it 8192 (ie: effective_batch_size * num_gpus = batch_per_gpu * num_gpus * num_gpus). If precise_bn_batch_size is not provided in the training_params, the latter heuristic will be taken.
seed : int (default=42)
Random seed to be set for torch, numpy, and random. When using DDP each process will have it’s seed set to seed + rank.
log_installed_packages : bool (default=False)
- When set, the list of all installed packages (and their versions) will be written to the tensorboard
and logfile (useful when trying to reproduce results).
dataset_statistics : bool (default=False)
Enable a statistic analysis of the dataset. If set to True the dataset will be analyzed and a report will be added to the tensorboard along with some sample images from the dataset. Currently only detection datasets are supported for analysis.
save_full_train_log : bool (default=False)
- When set, a full log (of all super_gradients modules, including uncaught exceptions from any other
module) of the training will be saved in the checkpoint directory under full_train_log.log
sg_logger : Union[AbstractSGLogger, str] (defauls=base_sg_logger)
Define the SGLogger object for this training process. The SGLogger handles all disk writes, logs, TensorBoard, remote logging and remote storage. By overriding the default base_sg_logger, you can change the storage location, support external monitoring and logging or support remote storage.
sg_logger_params : dict
SGLogger parameters
clip_grad_norm : float
Defines a maximal L2 norm of the gradients. Values which exceed the given value will be clipped
- Returns
- predict(inputs, targets=None, half=False, normalize=False, verbose=False, move_outputs_to_cpu=True)[source]
A fast predictor for a batch of inputs :param inputs: torch.tensor or numpy.array
a batch of inputs
- Parameters
targets – torch.tensor() corresponding labels - if non are given - accuracy will not be computed
verbose – bool print the results to screen
normalize – bool If true, normalizes the tensor according to the dataloader’s normalization values
half – Performs half precision evaluation
move_outputs_to_cpu – Moves the results from the GPU to the CPU
- Returns
outputs, acc, net_time, gross_time networks predictions, accuracy calculation, forward pass net time, function gross time
- compute_model_runtime(input_dims: Optional[tuple] = None, batch_sizes: Union[tuple, list, int] = (1, 8, 16, 32, 64), verbose: bool = True)[source]
Compute the “atomic” inference time and throughput. Atomic refers to calculating the forward pass independently, discarding effects such as data augmentation, data upload to device, multi-gpu distribution etc. :param input_dims: tuple
shape of a basic input to the network (without the first index) e.g. (3, 224, 224) if None uses an input from the test loader
- Parameters
batch_sizes – int or list Batch sizes for latency calculation
verbose – bool Prints results to screen
- Returns
log: dict Latency and throughput for each tested batch size
- re_build_model(arch_params={})[source]
- arch_paramsdict
Architecture H.P. e.g.: block, num_blocks, num_classes, etc.
- Returns
- update_architecture(structure)[source]
- architecturestr
Defines the network’s architecture according to the options in models/all_architectures
- load_checkpointbool
Loads a checkpoint according to experiment_name
- arch_paramsdict
Architecture H.P. e.g.: block, num_blocks, num_classes, etc.
- Returns
- test(test_loader: Optional[torch.utils.data.dataloader.DataLoader] = None, loss: Optional[torch.nn.modules.loss._Loss] = None, silent_mode: bool = False, test_metrics_list=None, loss_logging_items_names=None, metrics_progress_verbose=False, test_phase_callbacks=None, use_ema_net=True) → tuple[source]
Evaluates the model on given dataloader and metrics.
- Parameters
test_loader – dataloader to perform test on.
test_metrics_list – (list(torchmetrics.Metric)) metrics list for evaluation.
silent_mode – (bool) controls verbosity
metrics_progress_verbose – (bool) controls the verbosity of metrics progress (default=False). Slows down the program.
- :param use_ema_net (bool) whether to perform test on self.ema_model.ema (when self.ema_model.ema exists,
otherwise self.net will be tested) (default=True)
- Returns
results tuple (tuple) containing the loss items and metric values.
- All of the above args will override SgModel’s corresponding attribute when not equal to None. Then evaluation
is ran on self.test_loader with self.test_metrics.
- evaluate(data_loader: torch.utils.data.dataloader.DataLoader, metrics: torchmetrics.collections.MetricCollection, evaluation_type: super_gradients.training.sg_model.sg_model.EvaluationType, epoch: Optional[int] = None, silent_mode: bool = False, metrics_progress_verbose: bool = False)[source]
Evaluates the model on given dataloader and metrics.
- Parameters
data_loader – dataloader to perform evaluataion on
metrics – (MetricCollection) metrics for evaluation
evaluation_type – (EvaluationType) controls which phase callbacks will be used (for example, on batch end, when evaluation_type=EvaluationType.VALIDATION the Phase.VALIDATION_BATCH_END callbacks will be triggered)
epoch – (int) epoch idx
silent_mode – (bool) controls verbosity
metrics_progress_verbose – (bool) controls the verbosity of metrics progress (default=False). Slows down the program significantly.
- Returns
results tuple (tuple) containing the loss items and metric values.
- instantiate_net(architecture: Union[torch.nn.modules.module.Module, type, str], arch_params: dict, checkpoint_params: dict, *args, **kwargs) → tuple[source]
- Instantiates nn.Module according to architecture and arch_params, and handles pretrained weights and the required
module manipulation (i.e head replacement).
- Parameters
architecture – String, torch.nn.Module or uninstantiated SgModule class describing the netowrks architecture.
arch_params – Architecture’s parameters passed to networks c’tor.
checkpoint_params – checkpoint loading related parameters dictionary with ‘pretrained_weights’ key, s.t it’s value is a string describing the dataset of the pretrained weights (for example “imagenent”).
- Returns
instantiated netowrk i.e torch.nn.Module, architecture_class (will be none when architecture is not str)
- class super_gradients.training.sg_model.MultiGPUMode(value)[source]
Bases:
str
,enum.Enum
- OFF - Single GPU Mode / CPU Mode
- DATA_PARALLEL - Multiple GPUs, Synchronous
- DISTRIBUTED_DATA_PARALLEL - Multiple GPUs, Asynchronous
- OFF = 'Off'
- DATA_PARALLEL = 'DP'
- DISTRIBUTED_DATA_PARALLEL = 'DDP'
- AUTO = 'AUTO'
- class super_gradients.training.sg_model.StrictLoad(value)[source]
Bases:
enum.Enum
- Wrapper for adding more functionality to torch’s strict_load parameter in load_state_dict().
- Attributes:
OFF - Native torch “strict_load = off” behaviour. See nn.Module.load_state_dict() documentation for more details. ON - Native torch “strict_load = on” behaviour. See nn.Module.load_state_dict() documentation for more details. NO_KEY_MATCHING - Allows the usage of SuperGradient’s adapt_checkpoint function, which loads a checkpoint by matching each
layer’s shapes (and bypasses the strict matching of the names of each layer (ie: disregards the state_dict key matching)).
- OFF = False
- ON = True
- NO_KEY_MATCHING = 'no_key_matching'
super_gradients.training.utils module
- class super_gradients.training.utils.Timer(device: str)[source]
Bases:
object
A class to measure time handling both GPU & CPU processes Returns time in milliseconds
- class super_gradients.training.utils.WrappedModel(module)[source]
Bases:
torch.nn.modules.module.Module
- forward(x)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- super_gradients.training.utils.convert_to_tensor(array)[source]
Converts numpy arrays and lists to Torch tensors before calculation losses :param array: torch.tensor / Numpy array / List
- super_gradients.training.utils.get_param(params, name, default_val=None)[source]
Retrieves a param from a parameter object/dict. If the parameter does not exist, will return default_val. In case the default_val is of type dictionary, and a value is found in the params - the function will return the default value dictionary with internal values overridden by the found value
i.e. default_opt_params = {‘lr’:0.1, ‘momentum’:0.99, ‘alpha’:0.001} training_params = {‘optimizer_params’: {‘lr’:0.0001}, ‘batch’: 32 …. } get_param(training_params, name=’optimizer_params’, default_val=default_opt_params) will return {‘lr’:0.0001, ‘momentum’:0.99, ‘alpha’:0.001}
- Parameters
params – an object (typically HpmStruct) or a dict holding the params
name – name of the searched parameter
default_val – assumed to be the same type as the value searched in the params
- Returns
the found value, or default if not found
- super_gradients.training.utils.tensor_container_to_device(obj: Union[torch.Tensor, tuple, list, dict], device: str, non_blocking=True)[source]
- recursively send compounded objects to device (sending all tensors to device and maintaining structure)
:param obj the object to send to device (list / tuple / tensor / dict) :param device: device to send the tensors to :param non_blocking: used for DistributedDataParallel :returns an object with the same structure (tensors, lists, tuples) with the device pointers (like
the return value of Tensor.to(device)
- super_gradients.training.utils.adapt_state_dict_to_fit_model_layer_names(model_state_dict: dict, source_ckpt: dict, exclude: list = [])[source]
Given a model state dict and source checkpoints, the method tries to correct the keys in the model_state_dict to fit the ckpt in order to properly load the weights into the model. If unsuccessful - returns None
- param model_state_dict
the model state_dict
- param source_ckpt
checkpoint dict
:exclude optional list for excluded layers :return: renamed checkpoint dict (if possible)
- super_gradients.training.utils.raise_informative_runtime_error(state_dict, checkpoint, exception_msg)[source]
Given a model state dict and source checkpoints, the method calls “adapt_state_dict_to_fit_model_layer_names” and enhances the exception_msg if loading the checkpoint_dict via the conversion method is possible
- super_gradients.training.utils.random_seed(is_ddp, device, seed)[source]
Sets random seed of numpy, torch and random.
When using ddp a seed will be set for each process according to its local rank derived from the device number. :param is_ddp: bool, will set different random seed for each process when using ddp. :param device: ‘cuda’,’cpu’, ‘cuda:<device_number>’ :param seed: int, random seed to be set