--- title: modeling.core keywords: fastai sidebar: home_sidebar summary: "This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your huggingface models trained via fastai" description: "This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your huggingface models trained via fastai" nb_path: "nbs/02_modeling-core.ipynb" ---
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Note that HF_baseModelWrapper
includes some nifty code for just passing in the things your model needs, as not all transformer architectures require/use the same information.
We use a Callback
for handling what is returned from the huggingface model ... "the huggingface model will return a tuple in outputs, with the actual predictions and some additional activations (should we want to use them is some regularization scheme)" - from the fastai Transformer's Tutorial
path = untar_data(URLs.IMDB_SAMPLE)
imdb_df = pd.read_csv(path/'texts.csv')
imdb_df.head()
task = HF_TASKS_AUTO.SequenceClassification
pretrained_model_name = "roberta-base" # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name, task=task)
blocks = (HF_TextBlock(hf_arch=hf_arch, hf_tokenizer=hf_tokenizer, padding='max_length'), CategoryBlock)
dblock = DataBlock(blocks=blocks,
get_x=ColReader('text'), get_y=ColReader('label'),
splitter=ColSplitter(col='is_valid'))
dls = dblock.dataloaders(imdb_df, bs=4)
dls.show_batch(max_n=2)
model = HF_BaseModelWrapper(hf_model)
learn = Learner(dls,
model,
opt_func=partial(Adam),
loss_func=CrossEntropyLossFlat(),
metrics=[accuracy],
cbs=[HF_BaseModelCallback],
splitter=hf_splitter)
learn.create_opt() # -> will create your layer groups based on your "splitter" function
learn.freeze()
.to_fp16()
requires a GPU so had to remove for tests to run on github. Let's check that we can get predictions.
b = dls.one_batch()
learn.model(b[0])
We have to create our own summary
methods above because fastai only works where things are represented by a single tensor. But in the case of huggingface transformers, a single sequence is represented by multiple tensors (in a dictionary).
The change to make this work is so minor I think that the fastai library can/will hopefully be updated to support this use case.
print(len(learn.opt.param_groups))
learn.lr_find(suggestions=True)
learn.fit_one_cycle(3, lr_max=1e-3)
learn.show_results(max_n=2)
Same as with summary
, we need to replace fastai's Learner.predict
method with the one above which is able to work with inputs that are represented by multiple tensors included in a dictionary.
learn.blurr_predict('I really liked the movie')
learn.unfreeze()
learn.fit_one_cycle(3, lr_max=slice(1e-6, 1e-3))
learn.recorder.plot_loss()
learn.show_results(max_n=2)
learn.blurr_predict("This was a really good movie")
learn.blurr_predict("Acting was so bad it was almost funny.")
learn.export(fname='seq_class_learn_export.pkl')
inf_learn = load_learner(fname='seq_class_learn_export.pkl')
inf_learn.blurr_predict("This movie should not be seen by anyone!!!!")
The tests below to ensure the core training code above works for all pretrained sequence classification models available in huggingface. These tests are excluded from the CI workflow because of how long they would take to run and the amount of data that would be required to download.
Note: Feel free to modify the code below to test whatever pretrained classification models you are working with ... and if any of your pretrained sequence classification models fail, please submit a github issue (or a PR if you'd like to fix it yourself)
try: del learn; torch.cuda.empty_cache()
except: pass
BLURR_MODEL_HELPER.get_models(task='SequenceClassification')
pretrained_model_names = [
'albert-base-v1',
'facebook/bart-base',
'bert-base-uncased',
'camembert-base',
'distilbert-base-uncased',
'monologg/electra-small-finetuned-imdb',
'flaubert/flaubert_small_cased',
'allenai/longformer-base-4096',
'google/mobilebert-uncased',
'roberta-base',
'xlm-mlm-en-2048',
'xlm-roberta-base',
'xlnet-base-cased'
]
path = untar_data(URLs.IMDB_SAMPLE)
model_path = Path('models')
imdb_df = pd.read_csv(path/'texts.csv')
#hide_output
task = HF_TASKS_AUTO.SequenceClassification
bsz = 2
test_results = []
for model_name in pretrained_model_names:
error=None
print(f'=== {model_name} ===\n')
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(model_name,
task=task,
config_kwargs={'num_labels': 2})
print(f'architecture:\t{hf_arch}\ntokenizer:\t{type(hf_tokenizer).__name__}\nmodel:\t\t{type(hf_model).__name__}\n')
blocks = (HF_TextBlock(hf_arch=hf_arch, hf_tokenizer=hf_tokenizer, max_length=128, padding='max_length'),
CategoryBlock)
dblock = DataBlock(blocks=blocks,
get_x=ColReader('text'),
get_y=ColReader('label'),
splitter=ColSplitter(col='is_valid'))
dls = dblock.dataloaders(imdb_df, bs=bsz)
model = HF_BaseModelWrapper(hf_model)
learn = Learner(dls,
model,
opt_func=partial(Adam),
loss_func=CrossEntropyLossFlat(),
metrics=[accuracy],
cbs=[HF_BaseModelCallback],
splitter=hf_splitter)
learn.create_opt() # -> will create your layer groups based on your "splitter" function
learn.freeze()
b = dls.one_batch()
try:
print('*** TESTING DataLoaders ***')
test_eq(len(b), bsz)
test_eq(len(b[0]['input_ids']), bsz)
test_eq(b[0]['input_ids'].shape, torch.Size([bsz, 128]))
test_eq(len(b[1]), bsz)
print('*** TESTING One pass through the model ***')
preds = learn.model(b[0])
test_eq(len(preds[0]), bsz)
test_eq(preds[0].shape, torch.Size([bsz, 2]))
print('*** TESTING Training/Results ***')
learn.fit_one_cycle(1, lr_max=1e-3)
test_results.append((hf_arch, type(hf_tokenizer).__name__, type(hf_model).__name__, 'PASSED', ''))
learn.show_results(max_n=2)
except Exception as err:
test_results.append((hf_arch, type(hf_tokenizer).__name__, type(hf_model).__name__, 'FAILED', err))
finally:
# cleanup
del learn; torch.cuda.empty_cache()
raw_data = nlp.load_dataset('civil_comments', split='train[:1%]')
len(raw_data)
toxic_df = pd.DataFrame(raw_data, columns=list(raw_data.features.keys()))
toxic_df.head()
lbl_cols = list(toxic_df.columns[2:]); lbl_cols
toxic_df = toxic_df.round({col: 0 for col in lbl_cols})
toxic_df = toxic_df.convert_dtypes()
toxic_df.head()
task = HF_TASKS_AUTO.SequenceClassification
pretrained_model_name = "roberta-base" # "distilbert-base-uncased" "bert-base-uncased"
config = AutoConfig.from_pretrained(pretrained_model_name)
config.num_labels = len(lbl_cols)
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name,
task=task,
config=config)
Note how we have to configure the num_labels
to the number of labels we are predicting. Given that our labels are already encoded, we use a MultiCategoryBlock
with encoded=True
and vocab
equal to the columns with our 1's and 0's.
blocks = (
HF_TextBlock(hf_arch=hf_arch, hf_tokenizer=hf_tokenizer),
MultiCategoryBlock(encoded=True, vocab=lbl_cols)
)
dblock = DataBlock(blocks=blocks,
get_x=ColReader('text'), get_y=ColReader(lbl_cols),
splitter=RandomSplitter())
dls = dblock.dataloaders(toxic_df, bs=4)
b = dls.one_batch()
len(b), b[0]['input_ids'].shape, b[1].shape
dls.show_batch(max_n=2)
model = HF_BaseModelWrapper(hf_model)
learn = Learner(dls,
model,
opt_func=partial(Adam),
loss_func=BCEWithLogitsLossFlat(),
metrics=[partial(accuracy_multi, thresh=0.2)],
cbs=[HF_BaseModelCallback],
splitter=hf_splitter)
learn.loss_func.thresh = 0.2
learn.create_opt() # -> will create your layer groups based on your "splitter" function
learn.freeze()
Since we're doing multi-label classification, we adjust our loss function to use binary cross-entropy and our metrics to use the multi-label friendly version of accuracy.
preds = model(b[0])
preds[0].shape
learn.lr_find(suggestions=True)
learn.fit_one_cycle(3, lr_max=3e-3)
learn.show_results(max_n=2)
learn.loss_func.thresh = 0.02
comment = """
Those damned affluent white people should only eat their own food, like cod cakes and boiled potatoes.
No enchiladas for them!
"""
learn.blurr_predict(comment)