--- title: Fully-Connected Layers Decomposer keywords: fastai sidebar: home_sidebar summary: "Factorize heavy FC layers into smaller ones" description: "Factorize heavy FC layers into smaller ones" nb_path: "nbs/06b_fc_decomposer.ipynb" ---
{% raw %}
{% endraw %}

We can factorize our big fully-connected layers and replace them by an approximation of two smaller layers. The idea is to make an SVD decomposition of the weight matrix, which will express the original matrix in a product of 3 matrices: $U \Sigma V^T$ With $\Sigma$ being a diagonal matrix with non-negative values along its diagonal (the singular values). We then define a value $k$ of singular values to keep and modify matrices $U$ and $V^T$ accordingly. The resulting will be an approximation of the initial matrix.

alt text

{% raw %}
 
{% endraw %} {% raw %}
{% endraw %} {% raw %}
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")

def label_func(f): return f[0].isupper()
{% endraw %} {% raw %}
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(64))
{% endraw %} {% raw %}
learn = Learner(dls, vgg16_bn(num_classes=2), metrics=accuracy)
{% endraw %} {% raw %}
learn.fit_one_cycle(3)
epoch train_loss valid_loss accuracy time
0 0.886644 0.652151 0.685386 00:22
1 0.692583 0.627857 0.685386 00:21
2 0.646516 0.622866 0.685386 00:22
{% endraw %} {% raw %}

class FC_Decomposer[source]

FC_Decomposer()

{% endraw %} {% raw %}
{% endraw %} {% raw %}
fc = FC_Decomposer()
{% endraw %} {% raw %}
new_model = fc.decompose(learn.model)
{% endraw %} {% raw %}
new_model
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): ReLU(inplace=True)
    (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (12): ReLU(inplace=True)
    (13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (14): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (16): ReLU(inplace=True)
    (17): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (19): ReLU(inplace=True)
    (20): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (21): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (26): ReLU(inplace=True)
    (27): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (28): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (29): ReLU(inplace=True)
    (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (31): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (32): ReLU(inplace=True)
    (33): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (35): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (36): ReLU(inplace=True)
    (37): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (38): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (39): ReLU(inplace=True)
    (40): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (42): ReLU(inplace=True)
    (43): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Sequential(
      (0): Linear(in_features=25088, out_features=2048, bias=False)
      (1): Linear(in_features=2048, out_features=4096, bias=True)
    )
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Sequential(
      (0): Linear(in_features=4096, out_features=2048, bias=False)
      (1): Linear(in_features=2048, out_features=4096, bias=True)
    )
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Sequential(
      (0): Linear(in_features=4096, out_features=1, bias=False)
      (1): Linear(in_features=1, out_features=2, bias=True)
    )
  )
)
{% endraw %}

We can see compare the amount of parameters before/after:

{% raw %}
count_parameters(learn.model)
134277186
{% endraw %} {% raw %}
count_parameters(new_model)
91281476
{% endraw %}

This represents a decrease of ~40M parameters !

Now this is an approximation, so it isn't really lossless and we should expect to see a performance drop, which will be bigger as we keep fewer singular values. Here we have:

{% raw %}
new_learn = Learner(dls, new_model, metrics=accuracy)
{% endraw %} {% raw %}
new_learn.validate()
(#2) [0.6868855357170105,0.6853856444358826]
{% endraw %}