--- title: Augmentations keywords: fastai sidebar: home_sidebar summary: "Utilities for creating augmentation pipelines mentioned in popular self supervised learning papers." description: "Utilities for creating augmentation pipelines mentioned in popular self supervised learning papers." nb_path: "nbs/01 - augmentations.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class GaussianBlur[source]

GaussianBlur(kernel_size:Tuple[int, int], sigma:Tuple[float, float], border_type:str='reflect', return_transform:bool=False, same_on_batch:bool=False, p:float=0.5) :: AugmentationBase2D

Apply gaussian blur given tensor image or a batch of tensor images randomly.

Args: kernel_size (Tuple[int, int]): the size of the kernel. sigma (Tuple[float, float]): the standard deviation of the kernel. border_type (str): the padding mode to be applied before convolving. The expected modes are: 'constant', 'reflect', 'replicate' or 'circular'. Default: 'reflect'. return_transform (bool): if True return the matrix describing the transformation applied to each input tensor. If False and the input is a tuple the applied transformation wont be concatenated. same_on_batch (bool): apply the same transformation across the batch. Default: False. p (float): probability of applying the transformation. Default value is 0.5.

Shape:

- Input: :math:`(C, H, W)` or :math:`(B, C, H, W)`, Optional: :math:`(B, 3, 3)`
- Output: :math:`(B, C, H, W)`

Note: Input tensor must be float and normalized into [0, 1] for the best differentiability support. Additionally, this function accepts another transformation tensor (:math:(B, 3, 3)), then the applied transformation will be merged int to the input transformation tensor and returned.

Examples:

>>> rng = torch.manual_seed(0)
>>> input = torch.rand(1, 1, 5, 5)
>>> blur = GaussianBlur((3, 3), (0.1, 2.0), p=1.)
>>> blur(input)
tensor([[[[0.6699, 0.4645, 0.3193, 0.1741, 0.1955],
          [0.5422, 0.6657, 0.6261, 0.6527, 0.5195],
          [0.3826, 0.2638, 0.1902, 0.1620, 0.2141],
          [0.6329, 0.6732, 0.5634, 0.4037, 0.2049],
          [0.8307, 0.6753, 0.7147, 0.5768, 0.7097]]]])
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class RandomGaussianBlur[source]

RandomGaussianBlur(p=0.5, s=(8, 32), same_on_batch=False, **kwargs) :: RandTransform

Randomly apply gaussian blur with probability p with a value of s

{% endraw %} {% raw %}
{% endraw %}

Why kornia, torchvision or fastai?

These libraries are preferred over others for their batch transform capabilities. Being able to apply these transforms on batches allow us to use GPUs and get a speed up around ~10-20x depending on the input image size.

Kornia

Kornia has ability to pass same_on_batch argument. If it's set to False, then augmentation will be randomly applied to elements of a batch.

jitter: Do color jitter or not bw: Do grayscale or not, blur: do blur or not, resize_scale: (min,max) scales for random resized crop, resize_ratio: (min,max) aspect ratios to use for random resized crop, s: scalar for color jitter, blur_s: (min, max) or single int for blur.

Their corresponding probabilities: flip_p, jitter_p, bw_p, blur_p

Kornia augmentation implementations have two additional parameters compare to TorchVision, return_transform and same_on_batch. The former provides the ability of undoing one geometry transformation while the latter can be used to control the randomness for a batched transformation. To enable those behaviour, you may simply set the flags to True.

Recommendation: Even though defaults work very well on many benchmark datasets it's always better to try different values and visualize your dataset before going further with training.

{% raw %}

get_kornia_batch_augs[source]

get_kornia_batch_augs(size, rotate=True, jitter=True, bw=True, blur=True, resize_scale=(0.2, 1.0), resize_ratio=(0.75, 1.3333333333333333), rotate_deg=30, s=0.6, blur_s=(4, 32), same_on_batch=False, flip_p=0.5, jitter_p=0.3, bw_p=0.3, blur_p=0.3, stats=([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), cuda=True, xtra_tfms=[])

Input batch augmentations implemented in kornia

{% endraw %} {% raw %}
{% endraw %}

kornia RandomResizedCrop in overall looks more zoomed in. Might be related to sampling function used for scale?

{% raw %}
aug, n = get_kornia_batch_augs(336, resize_scale=(0.2,1), stats=imagenet_stats, cuda=False, same_on_batch=False), 5
fig,ax = plt.subplots(n,2,figsize=(8,n*4))
for i in range(n): 
    show_image(t1,ax=ax[i][0])
    show_image(aug.decode(aug(t1)).clamp(0,1)[0], ax=ax[i][1])
{% endraw %}

GPU batch transforms are ~10x - ~20x faster than CPU depending on image size. Larger image sizes benefit from the GPU more.

{% raw %}
xb = (torch.stack([t1]*32))
aug= get_kornia_batch_augs(336, resize_scale=(0.75,1),  stats=imagenet_stats, cuda=False)
{% endraw %} {% raw %}
%%timeit
out = aug(xb)
430 ms ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
{% endraw %} {% raw %}
if torch.cuda.is_available():
    xb = xb.to(default_device())
    aug = get_kornia_batch_augs(336, resize_scale=(0.75,1), stats=imagenet_stats)
{% endraw %} {% raw %}
%%timeit 
if torch.cuda.is_available():
    out = aug(xb) # ignore: GPU warmup
    torch.cuda.synchronize()
26.9 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %}

same_on_batch=False

{% raw %}
%%timeit
if torch.cuda.is_available():
    out = aug(xb)
    torch.cuda.synchronize()
26.7 ms ± 354 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %} {% raw %}
if torch.cuda.is_available():
    xb = xb.to(default_device())
    aug = get_kornia_batch_augs(336, resize_scale=(0.75,1), same_on_batch=True, stats=imagenet_stats)
{% endraw %}

same_on_batch=True

{% raw %}
%%timeit
if torch.cuda.is_available():
    out = aug(xb)
    torch.cuda.synchronize()
17.5 ms ± 503 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{% endraw %}

Torchvision

Torchvision doesn't have a same_on_batch parameter, it also doesn't support jitter_p.

{% raw %}

get_torchvision_batch_augs[source]

get_torchvision_batch_augs(size, rotate=True, jitter=True, bw=True, blur=True, resize_scale=(0.2, 1.0), resize_ratio=(0.75, 1.3333333333333333), rotate_deg=30, s=0.6, blur_s=(4, 32), flip_p=0.5, bw_p=0.3, blur_p=0.3, stats=([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), cuda=True, xtra_tfms=[])

Input batch augmentations implemented in torchvision

{% endraw %} {% raw %}
{% endraw %} {% raw %}
aug, n = get_torchvision_batch_augs(336, resize_scale=(0.2, 1), stats=imagenet_stats, cuda=False), 5
fig,ax = plt.subplots(n,2,figsize=(8,n*4))
for i in range(n): 
    show_image(t1,ax=ax[i][0])
    show_image(aug.decode(aug(t1)).clamp(0,1)[0], ax=ax[i][1])
{% endraw %}

Torchvision is slightly faster than kornia with same_on_batch=False.

{% raw %}
xb = (torch.stack([t1]*32))
aug= get_torchvision_batch_augs(336, resize_scale=(0.75,1), stats=imagenet_stats, cuda=False)
{% endraw %} {% raw %}
%%timeit
out = aug(xb)
910 ms ± 113 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
{% endraw %} {% raw %}
if torch.cuda.is_available():
    xb = xb.to(default_device())
    aug = get_torchvision_batch_augs(336, resize_scale=(0.75,1), stats=imagenet_stats)
{% endraw %} {% raw %}
%%timeit
if torch.cuda.is_available():
    out = aug(xb)
    torch.cuda.synchronize()
21.4 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %}

Fastai

In fastai few of the batch transforms are named differently, that is why it is not used as first choice. There might be better or worse implementation difference. Although, in general fastai has a faster and more accurate batch transform through a composition function called setup_aug_tfms.

Here, max_lightning for color jitter magnitude.

Fastai is as fast as the combination of kornia and torchvision, but it should be noted that RandomResizedCropGPU applies same crop to all elements (which is probably fine) similar to torchvision and color jittering is implemented in 4 separate transforms.

{% raw %}

get_fastai_batch_augs[source]

get_fastai_batch_augs(size, rotate=True, jitter=True, bw=True, blur=True, min_scale=0.2, resize_ratio=(0.75, 1.3333333333333333), max_lighting=0.2, rotate_deg=30, s=0.6, blur_s=(8, 32), same_on_batch=False, flip_p=0.5, jitter_p=0.3, bw_p=0.3, blur_p=0.3, stats=([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), cuda=True, xtra_tfms=[])

Input batch augmentations implemented in kornia

{% endraw %} {% raw %}
{% endraw %} {% raw %}
aug, n = get_fastai_batch_augs(336, min_scale=0.2, stats=imagenet_stats, cuda=False), 5
fig,ax = plt.subplots(n,2,figsize=(8,n*4))
for i in range(n): 
    show_image(t1,ax=ax[i][0])
    show_image(aug.decode(aug(t1)).clamp(0,1)[0], ax=ax[i][1])
{% endraw %} {% raw %}
xb = (torch.stack([t1]*32))
aug = get_fastai_batch_augs(336, min_scale=0.75, stats=imagenet_stats, cuda=False)
{% endraw %} {% raw %}
%%timeit
out = aug(xb)
1.07 s ± 157 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
{% endraw %} {% raw %}
if torch.cuda.is_available():
    xb = xb.to(default_device())
    aug = get_fastai_batch_augs(336, min_scale=0.75, stats=imagenet_stats)
{% endraw %} {% raw %}
%%timeit
if torch.cuda.is_available():
    out = aug(xb)
    torch.cuda.synchronize()
29 ms ± 527 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
{% endraw %}

Kornia + Torchvision + Fastai

Here we use RandomResizedCrop from torchvision and keep the remaining augmentations same as kornia. This is kind of best of both worlds - fast and diverse augmentations.

Also, Rotate from fastai is used for reflection padding.

{% raw %}

get_batch_augs[source]

get_batch_augs(size, rotate=True, jitter=True, bw=True, blur=True, resize_scale=(0.2, 1.0), resize_ratio=(0.75, 1.3333333333333333), rotate_deg=30, s=0.6, blur_s=(4, 32), same_on_batch=False, flip_p=0.5, rotate_p=0.3, jitter_p=0.3, bw_p=0.3, blur_p=0.3, stats=([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), cuda=True, xtra_tfms=[])

Input batch augmentations implemented in tv+kornia+fastai

{% endraw %} {% raw %}
{% endraw %} {% raw %}
aug, n = get_batch_augs(336, resize_scale=(0.2, 1), stats=imagenet_stats,cuda=False), 5
fig,ax = plt.subplots(n,2,figsize=(8,n*4))
for i in range(n): 
    show_image(t1,ax=ax[i][0])
    show_image(aug.decode(aug(t1)).clamp(0,1)[0], ax=ax[i][1])
{% endraw %}

Torchvision is slightly faster than kornia same_on_batch=True.

{% raw %}
xb = (torch.stack([t1]*32))
aug = get_batch_augs(336, resize_scale=(0.75,1), stats=imagenet_stats, cuda=False)
{% endraw %} {% raw %}
%%timeit
out = aug(xb)
255 ms ± 55.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
{% endraw %} {% raw %}
if torch.cuda.is_available():
    xb = xb.to(default_device())
    aug = get_batch_augs(336, resize_scale=(0.75,1), stats=imagenet_stats)
{% endraw %} {% raw %}
%%timeit
if torch.cuda.is_available():
    out = aug(xb)
    torch.cuda.synchronize()
19 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
{% endraw %}

Adding Extra tfms

You can simply add any batch transform by passing it as list to xtra_tfms.

{% raw %}
aug, n = get_batch_augs(336, resize_scale=(0.2, 1),  stats=imagenet_stats, cuda=False, xtra_tfms=[RandomErasing(p=1.)]), 5
fig,ax = plt.subplots(n,2,figsize=(8,n*4))
for i in range(n): 
    show_image(t1,ax=ax[i][0])
    show_image(aug.decode(aug(t1)).clamp(0,1)[0], ax=ax[i][1])
{% endraw %}

Utilities

{% raw %}

get_multi_aug_pipelines[source]

get_multi_aug_pipelines(n, size, rotate=True, jitter=True, bw=True, blur=True, resize_scale=(0.2, 1.0), resize_ratio=(0.75, 1.3333333333333333), rotate_deg=30, s=0.6, blur_s=(4, 32), same_on_batch=False, flip_p=0.5, rotate_p=0.3, jitter_p=0.3, bw_p=0.3, blur_p=0.3, stats=([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), cuda=True, xtra_tfms=[])

{% endraw %} {% raw %}
{% endraw %} {% raw %}

assert_aug_pipelines[source]

assert_aug_pipelines(aug_pipelines:List[Pipeline])

{% endraw %} {% raw %}
{% endraw %} {% raw %}
augs = get_multi_aug_pipelines(n=2,size=224)
assert_aug_pipelines(augs)
{% endraw %}