--- title: examples.multilabel_classification keywords: fastai sidebar: home_sidebar summary: "This is an example of how to use blurr for multilabel classification tasks" description: "This is an example of how to use blurr for multilabel classification tasks" nb_path: "nbs/99a_examples-multilabel.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
 
{% endraw %} {% raw %}
{% endraw %} {% raw %}
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Using GPU #1: GeForce GTX 1080 Ti
{% endraw %}

Let's start by building our DataBlock

{% raw %}
raw_data = datasets.load_dataset('civil_comments', split='train[:1%]') 
len(raw_data)
Using custom data configuration default
Reusing dataset civil_comments (/home/wgilliam/.cache/huggingface/datasets/civil_comments/default/0.9.0/98bdc73fc77a117cf5d17c9977e278c8023c64177a3ed9e0c49f7a5bdf10a47b)
18049
{% endraw %} {% raw %}
toxic_df = pd.DataFrame(raw_data, columns=list(raw_data.features.keys()))
toxic_df.head()
text toxicity severe_toxicity obscene threat insult identity_attack sexual_explicit
0 This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done! 0.000000 0.000000 0.0 0.0 0.00000 0.000000 0.0
1 Thank you!! This would make my life a lot less anxiety-inducing. Keep it up, and don't let anyone get in your way! 0.000000 0.000000 0.0 0.0 0.00000 0.000000 0.0
2 This is such an urgent design problem; kudos to you for taking it on. Very impressive! 0.000000 0.000000 0.0 0.0 0.00000 0.000000 0.0
3 Is this something I'll be able to install on my site? When will you be releasing it? 0.000000 0.000000 0.0 0.0 0.00000 0.000000 0.0
4 haha you guys are a bunch of losers. 0.893617 0.021277 0.0 0.0 0.87234 0.021277 0.0
{% endraw %} {% raw %}
lbl_cols = list(toxic_df.columns[2:]); lbl_cols
['severe_toxicity',
 'obscene',
 'threat',
 'insult',
 'identity_attack',
 'sexual_explicit']
{% endraw %} {% raw %}
toxic_df = toxic_df.round({col: 0 for col in lbl_cols})
toxic_df = toxic_df.convert_dtypes()

toxic_df.head()
text toxicity severe_toxicity obscene threat insult identity_attack sexual_explicit
0 This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done! 0.000000 0 0 0 0 0 0
1 Thank you!! This would make my life a lot less anxiety-inducing. Keep it up, and don't let anyone get in your way! 0.000000 0 0 0 0 0 0
2 This is such an urgent design problem; kudos to you for taking it on. Very impressive! 0.000000 0 0 0 0 0 0
3 Is this something I'll be able to install on my site? When will you be releasing it? 0.000000 0 0 0 0 0 0
4 haha you guys are a bunch of losers. 0.893617 0 0 0 1 0 0
{% endraw %}

For our huggingface model, let's used the distilled version of RoBERTa. This should allow us to train the model on bigger mini-batches without much performance loss. Even on my 1080Ti, I should be able to train all the parameters (which isn't possible with the roberta-base model)

{% raw %}
model_cls = AutoModelForSequenceClassification

pretrained_model_name = "distilroberta-base"
config = AutoConfig.from_pretrained(pretrained_model_name)
config.num_labels = len(lbl_cols)

hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, 
                                                                  model_cls=model_cls, 
                                                                  config=config)

print(hf_arch)
print(type(hf_config))
print(type(hf_tokenizer))
print(type(hf_model))
roberta
<class 'transformers.models.roberta.configuration_roberta.RobertaConfig'>
<class 'transformers.models.roberta.tokenization_roberta_fast.RobertaTokenizerFast'>
<class 'transformers.models.roberta.modeling_roberta.RobertaForSequenceClassification'>
{% endraw %}

Note how we have to configure the num_labels to the number of labels we are predicting. Given that our labels are already encoded, we use a MultiCategoryBlock with encoded=True and vocab equal to the columns with our 1's and 0's.

{% raw %}
blocks = (
    HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model), 
    MultiCategoryBlock(encoded=True, vocab=lbl_cols)
)

dblock = DataBlock(blocks=blocks, 
                   get_x=ColReader('text'), get_y=ColReader(lbl_cols), 
                   splitter=RandomSplitter())
{% endraw %} {% raw %}
dls = dblock.dataloaders(toxic_df, bs=16)
{% endraw %} {% raw %}
b = dls.one_batch()
len(b), b[0]['input_ids'].shape, b[1].shape
(2, torch.Size([16, 391]), torch.Size([16, 6]))
{% endraw %}

With our DataLoaders built, we can now build our Learner and train. We'll use mixed precision so we can train with bigger batches

{% raw %}
model = HF_BaseModelWrapper(hf_model)

learn = Learner(dls, 
                model,
                opt_func=partial(Adam),
                loss_func=BCEWithLogitsLossFlat(),
                metrics=[partial(accuracy_multi, thresh=0.2)],
                cbs=[HF_BaseModelCallback],
                splitter=hf_splitter).to_fp16()

learn.loss_func.thresh = 0.2
learn.create_opt()             # -> will create your layer groups based on your "splitter" function
learn.freeze()
{% endraw %} {% raw %}
learn.blurr_summary()
HF_BaseModelWrapper (Input shape: 16 x 391)
============================================================================
Layer (type)         Output Shape         Param #    Trainable 
============================================================================
                     16 x 391 x 768      
Embedding                                 38603520   False     
Embedding                                 394752     False     
Embedding                                 768        False     
LayerNorm                                 1536       True      
Dropout                                                        
Linear                                    590592     False     
Linear                                    590592     False     
Linear                                    590592     False     
Dropout                                                        
Linear                                    590592     False     
LayerNorm                                 1536       True      
Dropout                                                        
____________________________________________________________________________
                     16 x 391 x 3072     
Linear                                    2362368    False     
____________________________________________________________________________
                     16 x 391 x 768      
Linear                                    2360064    False     
LayerNorm                                 1536       True      
Dropout                                                        
Linear                                    590592     False     
Linear                                    590592     False     
Linear                                    590592     False     
Dropout                                                        
Linear                                    590592     False     
LayerNorm                                 1536       True      
Dropout                                                        
____________________________________________________________________________
                     16 x 391 x 3072     
Linear                                    2362368    False     
____________________________________________________________________________
                     16 x 391 x 768      
Linear                                    2360064    False     
LayerNorm                                 1536       True      
Dropout                                                        
Linear                                    590592     False     
Linear                                    590592     False     
Linear                                    590592     False     
Dropout                                                        
Linear                                    590592     False     
LayerNorm                                 1536       True      
Dropout                                                        
____________________________________________________________________________
                     16 x 391 x 3072     
Linear                                    2362368    False     
____________________________________________________________________________
                     16 x 391 x 768      
Linear                                    2360064    False     
LayerNorm                                 1536       True      
Dropout                                                        
Linear                                    590592     False     
Linear                                    590592     False     
Linear                                    590592     False     
Dropout                                                        
Linear                                    590592     False     
LayerNorm                                 1536       True      
Dropout                                                        
____________________________________________________________________________
                     16 x 391 x 3072     
Linear                                    2362368    False     
____________________________________________________________________________
                     16 x 391 x 768      
Linear                                    2360064    False     
LayerNorm                                 1536       True      
Dropout                                                        
Linear                                    590592     False     
Linear                                    590592     False     
Linear                                    590592     False     
Dropout                                                        
Linear                                    590592     False     
LayerNorm                                 1536       True      
Dropout                                                        
____________________________________________________________________________
                     16 x 391 x 3072     
Linear                                    2362368    False     
____________________________________________________________________________
                     16 x 391 x 768      
Linear                                    2360064    False     
LayerNorm                                 1536       True      
Dropout                                                        
Linear                                    590592     False     
Linear                                    590592     False     
Linear                                    590592     False     
Dropout                                                        
Linear                                    590592     False     
LayerNorm                                 1536       True      
Dropout                                                        
____________________________________________________________________________
                     16 x 391 x 3072     
Linear                                    2362368    False     
____________________________________________________________________________
                     16 x 391 x 768      
Linear                                    2360064    False     
LayerNorm                                 1536       True      
Dropout                                                        
Linear                                    590592     True      
Dropout                                                        
____________________________________________________________________________
                     16 x 6              
Linear                                    4614       True      
____________________________________________________________________________

Total params: 82,123,014
Total trainable params: 615,174
Total non-trainable params: 81,507,840

Optimizer used: functools.partial(<function Adam at 0x7f95683e75f0>)
Loss function: FlattenedLoss of BCEWithLogitsLoss()

Model frozen up to parameter group #2

Callbacks:
  - TrainEvalCallback
  - Recorder
  - ProgressCallback
  - HF_BaseModelCallback
  - MixedPrecision
{% endraw %} {% raw %}
preds = model(b[0])
preds.logits.shape, preds
(torch.Size([16, 6]),
 SequenceClassifierOutput(loss=None, logits=tensor([[ 0.0114,  0.1325, -0.0049,  0.1052, -0.0232, -0.1914],
         [ 0.0098,  0.1451,  0.0054,  0.1219, -0.0074, -0.1830],
         [ 0.0122,  0.1300, -0.0190,  0.1171, -0.0060, -0.1865],
         [-0.0075,  0.1320, -0.0012,  0.0957, -0.0075, -0.1961],
         [ 0.0153,  0.1442, -0.0174,  0.1096, -0.0131, -0.1983],
         [-0.0057,  0.1351, -0.0210,  0.0939, -0.0210, -0.2005],
         [ 0.0163,  0.1267,  0.0042,  0.1046, -0.0060, -0.1871],
         [ 0.0083,  0.1442, -0.0202,  0.0991, -0.0207, -0.1946],
         [-0.0009,  0.1344, -0.0088,  0.1121, -0.0121, -0.1806],
         [ 0.0007,  0.1304, -0.0037,  0.0946, -0.0195, -0.1752],
         [-0.0047,  0.1336, -0.0049,  0.1088, -0.0237, -0.1841],
         [-0.0017,  0.1403, -0.0106,  0.1027, -0.0141, -0.1969],
         [-0.0013,  0.1422, -0.0262,  0.1222, -0.0240, -0.1945],
         [ 0.0206,  0.1353, -0.0160,  0.1183, -0.0022, -0.1927],
         [-0.0035,  0.1304, -0.0038,  0.1053, -0.0144, -0.1860],
         [ 0.0009,  0.1479, -0.0224,  0.1191, -0.0173, -0.2104]],
        device='cuda:1', grad_fn=<AddmmBackward>), hidden_states=None, attentions=None))
{% endraw %} {% raw %}
learn.lr_find(suggestions=True)
SuggestedLRs(lr_min=0.010000000149011612, lr_steep=0.0010000000474974513)
{% endraw %} {% raw %}
learn.fit_one_cycle(1, lr_max=1e-2)
epoch train_loss valid_loss accuracy_multi time
0 0.035955 0.034553 0.993119 01:03
{% endraw %} {% raw %}
learn.unfreeze()
learn.lr_find(suggestions=True, start_lr=1e-12, end_lr=1e-5)
{% endraw %} {% raw %}
learn.fit_one_cycle(2, lr_max=slice(1e-10, 4e-9))
epoch train_loss valid_loss accuracy_multi time
0 0.037899 0.034553 0.993119 01:41
1 0.034563 0.034553 0.993119 01:41
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2)
text None target
0 I always find the Star Wars/Star Trek debate funny because they seem so different to me that I have a hard time understanding how the two can be compared. Star Trek was created as a series of books and an episodic TV show, meaning that it was meant to have discreet stories told and wrapped up in a short time. Star Wars, on the other hand, was written as a series of movies with a far-reaching storyline. Star Trek is meant to be set in our own future, complete with our history and built upon our understanding of the real universe. Star Wars, however, exists "a long time ago, in a galaxy far, far away." That opens SW up to potentially involve elements that would be too fantastical in a universe based on our own, no matter how futuristic. There are many other reasons, but fundamentally, comparing SW and ST does not seem like an appropriate comparison. (Also, if you insist on comparing them, it would probably be best to be done by someone with a reasonable degree of knowledge of both properties.) []
1 Let's see, Trump has spent the least money and gained the most for it of any major candidate. That could apply to a person I want to lead a nation with a debt problem. He is mostly spending his own money rather than taking it from special interests. Also good. He has been vastly successful in business in this country and around the world. Good. He is widely known as a killer negotiator, certainly a good thing in politics both foreign and domestic. If he is in a room with Vladimir Putin, he won't be hitting the reset button or coming out of the room in second place on any topic of discussion. Very good. He's a bit unpredictable, maybe even a loose cannon at times, but that's what everyone thought about Reagan; could actually be of use in many situations. Says what he thinks, but also knows when not to spill the beans. And will never, ever leave Americans under fire without help.\n\nOn the other hand, Hillary can tell a lie with the best of them, and would be a "steady hand."\n\nEasy pick. []
{% endraw %} {% raw %}
learn.loss_func.thresh = 0.02
{% endraw %} {% raw %}
comment = """
Those damned affluent white people should only eat their own food, like cod cakes and boiled potatoes. 
No enchiladas for them!
"""
learn.blurr_predict(comment)
[(((#1) ['insult'],),
  (#1) [tensor([False, False, False,  True, False, False])],
  (#1) [tensor([1.1301e-05, 4.5208e-03, 2.5514e-04, 2.7585e-02, 3.3766e-03, 1.5307e-03])])]
{% endraw %}

Cleanup