--- title: Pruner keywords: fastai sidebar: home_sidebar summary: "Remove useless filters to recreate a dense network" description: "Remove useless filters to recreate a dense network" nb_path: "nbs/02a_pruner.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
 
{% endraw %}

{% include important.html content='The Pruner method currently works on fully-feedforward ConvNets, e.g. VGG16. Support for residual connections, e.g. ResNets is under development.' %}

When our network has filters containing zero values, there is an additional step that we may take. Indeed, those zero-filters can be physically removed from our network, allowing us to get a new, dense, architecture.

This can be done by reexpressing each layer, reducing the number of filter, to match the number of non-zero filters. However, when we remove a filter in a layer, this means that there will be a missing activation map, which should be used by all the filters in the next layer. So, not only should we physically remove the filter, but also its corresponding kernel in each of the filters in the next layer (see Fig. below)

alt text

Let's illustrate this with an example:

{% raw %}
path = untar_data(URLs.PETS)

files = get_image_files(path/"images")

def label_func(f): return f[0].isupper()
{% endraw %} {% raw %}
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(64))
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class Pruner[source]

Pruner()

{% endraw %} {% raw %}
{% endraw %} {% raw %}
learn = Learner(dls, vgg16_bn(num_classes=2), metrics=accuracy)
{% endraw %} {% raw %}
count_parameters(learn.model)
134277186
{% endraw %}

Our initial model, a VGG16, possess more than 134 million parameters. Let's see what happens when we make it sparse, on a filter level

{% raw %}
sp_cb=SparsifyCallback(end_sparsity=50, granularity='filter', method='local', criteria=large_final, sched_func=sched_onecycle)
{% endraw %} {% raw %}
learn.fit_one_cycle(3, 3e-4, cbs=sp_cb)
Pruning of filter until a sparsity of 50%
Saving Weights at epoch 0
epoch train_loss valid_loss accuracy time
0 0.897482 0.611214 0.698241 00:14
1 0.658607 0.561114 0.706360 00:13
2 0.555238 0.527486 0.718539 00:13
Sparsity at the end of epoch 0: 10.43%
Sparsity at the end of epoch 1: 48.29%
Sparsity at the end of epoch 2: 50.00%
Final Sparsity: 50.00
{% endraw %} {% raw %}
count_parameters(learn.model)
134277186
{% endraw %}

The total amount of parameters hasn't changed! This is because we only replaced the values by zeroes, leading to a sparse model, but they are still there.

The Pruner will take care of removing those useless filters.

{% raw %}
pruner = Pruner()
pruned_model = pruner.prune_model(learn.model)
{% endraw %}

Done! Let's see if the performance is still the same

{% raw %}
pruned_learn = Learner(dls, pruned_model.cuda(), metrics=accuracy)
{% endraw %} {% raw %}
pruned_learn.validate()
(#2) [0.5265399813652039,0.7212449312210083]
{% endraw %} {% raw %}
count_parameters(pruned_learn.model)
71858210
{% endraw %}

Now we have 71 million of parameters, approximately 50% of the initial parameters as we asked!