--- title: examples.multilabel_classification keywords: fastai sidebar: home_sidebar summary: "This is an example of how to use blurr for multilabel classification tasks" description: "This is an example of how to use blurr for multilabel classification tasks" nb_path: "nbs/99a_examples-multilabel.ipynb" ---
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Let's start by building our DataBlock
raw_data = datasets.load_dataset('civil_comments', split='train[:1%]')
len(raw_data)
toxic_df = pd.DataFrame(raw_data, columns=list(raw_data.features.keys()))
toxic_df.head()
lbl_cols = list(toxic_df.columns[2:]); lbl_cols
toxic_df = toxic_df.round({col: 0 for col in lbl_cols})
toxic_df = toxic_df.convert_dtypes()
toxic_df.head()
For our huggingface model, let's used the distilled version of RoBERTa. This should allow us to train the model on bigger mini-batches without much performance loss. Even on my 1080Ti, I should be able to train all the parameters (which isn't possible with the roberta-base
model)
task = HF_TASKS_ALL.SequenceClassification
pretrained_model_name = "distilroberta-base"
config = AutoConfig.from_pretrained(pretrained_model_name)
config.num_labels = len(lbl_cols)
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name,
task=task,
config=config)
print(hf_arch)
print(type(hf_config))
print(type(hf_tokenizer))
print(type(hf_model))
Note how we have to configure the num_labels
to the number of labels we are predicting. Given that our labels are already encoded, we use a MultiCategoryBlock
with encoded=True and vocab equal to the columns with our 1's and 0's.
blocks = (
HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model),
MultiCategoryBlock(encoded=True, vocab=lbl_cols)
)
dblock = DataBlock(blocks=blocks,
get_x=ColReader('text'), get_y=ColReader(lbl_cols),
splitter=RandomSplitter())
dls = dblock.dataloaders(toxic_df, bs=16)
b = dls.one_batch()
len(b), b[0]['input_ids'].shape, b[1].shape
With our DataLoaders built, we can now build our Learner
and train. We'll use mixed precision so we can train with bigger batches
model = HF_BaseModelWrapper(hf_model)
learn = Learner(dls,
model,
opt_func=partial(Adam),
loss_func=BCEWithLogitsLossFlat(),
metrics=[partial(accuracy_multi, thresh=0.2)],
cbs=[HF_BaseModelCallback],
splitter=hf_splitter).to_fp16()
learn.loss_func.thresh = 0.2
learn.create_opt() # -> will create your layer groups based on your "splitter" function
learn.freeze()
learn.blurr_summary()
preds = model(b[0])
preds.logits.shape, preds
learn.lr_find(suggestions=True)
learn.fit_one_cycle(1, lr_max=1e-2)
learn.unfreeze()
learn.lr_find(suggestions=True, start_lr=1e-12, end_lr=1e-5)
learn.fit_one_cycle(2, lr_max=slice(1e-10, 4e-9))
learn.show_results(learner=learn, max_n=2)
learn.loss_func.thresh = 0.02
comment = """
Those damned affluent white people should only eat their own food, like cod cakes and boiled potatoes.
No enchiladas for them!
"""
learn.blurr_predict(comment)