--- title: modeling.core keywords: fastai sidebar: home_sidebar summary: "This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your huggingface models trained via fastai" description: "This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your huggingface models trained via fastai" nb_path: "nbs/02_modeling-core.ipynb" ---
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Note that HF_BaseModelWrapper
includes some nifty code for just passing in the things your model needs, as not all transformer architectures require/use the same information.
If you want to let your huggingface model calculate the loss for you, make sure you include the labels
argument in your inputs and use HF_PreCalculatedLoss
as your loss function. Even though we don't really need a loss function per se, we have to provide a custom loss class/function for fastai to function properly (e.g. one with a decodes
and activation
methods). Why? Because these methods will get called in methods like show_results
to get the actual predictions.
We use a Callback
for handling what is returned from the huggingface model. The return type is (ModelOutput
)[https://huggingface.co/transformers/main_classes/output.html#transformers.file_utils.ModelOutput] which makes it easy to return all the goodies we asked for.
Note that your Learner
's loss will be set for you only if the huggingface model returns one and you are using the HF_PreCalculatedLoss
loss function.
Also note that anything else you asked the model to return (for example, last hidden state, etc..) will be available for you via the blurr_model_outputs
property attached to your Learner
. For example, assuming you are using BERT for a classification task ... if you have told your HF_BaseModelWrapper
instance to return attentions, you'd be able to access them via learn.blurr_model_outputs['attentions']
.
path = untar_data(URLs.IMDB_SAMPLE)
imdb_df = pd.read_csv(path/'texts.csv')
imdb_df.head()
task = HF_TASKS_AUTO.SequenceClassification
pretrained_model_name = "roberta-base" # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name, task=task)
blocks = (HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model), CategoryBlock)
dblock = DataBlock(blocks=blocks, get_x=ColReader('text'), get_y=ColReader('label'), splitter=ColSplitter())
dls = dblock.dataloaders(imdb_df, bs=4)
dls.show_batch(dataloaders=dls, max_n=2)
model = HF_BaseModelWrapper(hf_model)
learn = Learner(dls,
model,
opt_func=partial(Adam),
loss_func=CrossEntropyLossFlat(),
metrics=[accuracy],
cbs=[HF_BaseModelCallback],
splitter=hf_splitter)
learn.create_opt() # -> will create your layer groups based on your "splitter" function
learn.freeze()
.to_fp16()
requires a GPU so had to remove for tests to run on github. Let's check that we can get predictions.
b = dls.one_batch()
learn.model(b[0])
We have to create our own summary
methods above because fastai only works where things are represented by a single tensor. But in the case of huggingface transformers, a single sequence is represented by multiple tensors (in a dictionary).
The change to make this work is so minor I think that the fastai library can/will hopefully be updated to support this use case.
print(len(learn.opt.param_groups))
learn.lr_find(suggestions=True)
learn.fit_one_cycle(1, lr_max=1e-3)
learn.show_results(learner=learn, max_n=2, trunc_at=500)
Same as with summary
, we need to replace fastai's Learner.predict
method with the one above which is able to work with inputs that are represented by multiple tensors included in a dictionary.
learn.blurr_predict('I really liked the movie')
learn.blurr_predict(['I really liked the movie', 'I really hated the movie'])
learn.unfreeze()
learn.fit_one_cycle(3, lr_max=slice(1e-7, 1e-4))
learn.recorder.plot_loss()
learn.show_results(learner=learn, max_n=2, trunc_at=500)
learn.blurr_predict("This was a really good movie")
learn.blurr_predict("Acting was so bad it was almost funny.")
export_fname = 'seq_class_learn_export'
learn.export(fname=f'{export_fname}.pkl')
inf_learn = load_learner(fname=f'{export_fname}.pkl')
inf_learn.blurr_predict("This movie should not be seen by anyone!!!!")
Much of the inspiration for the code below comes from Zach Mueller's excellent fastinference library, and in many places I simply adapted his code to work with blurr and the various huggingface transformers tasks.
learn.blurr_to_onnx(export_fname, quantize=True)
onnx_inf = blurrONNX(export_fname)
onnx_inf.predict(['I really liked the movie'])
%timeit inf_learn.blurr_predict(['I really liked the movie', 'I hated everything in it'])
%timeit onnx_inf.predict(['I really liked the movie', 'I hated everything in it'])
onnx_inf = blurrONNX(export_fname, use_quant_version=True)
onnx_inf.predict(['I hated everything in it'])
%timeit inf_learn.blurr_predict(['I really liked the movie', 'I hated everything in it'])
%timeit onnx_inf.predict(['I really liked the movie', 'I hated everything in it'])
The tests below to ensure the core training code above works for all pretrained sequence classification models available in huggingface. These tests are excluded from the CI workflow because of how long they would take to run and the amount of data that would be required to download.
Note: Feel free to modify the code below to test whatever pretrained classification models you are working with ... and if any of your pretrained sequence classification models fail, please submit a github issue (or a PR if you'd like to fix it yourself)
try: del learn; torch.cuda.empty_cache()
except: pass
BLURR_MODEL_HELPER.get_models(task='SequenceClassification')
pretrained_model_names = [
'albert-base-v1',
'facebook/bart-base',
'bert-base-uncased',
'camembert-base',
'microsoft/deberta-base',
'distilbert-base-uncased',
'monologg/electra-small-finetuned-imdb',
'flaubert/flaubert_small_cased',
'funnel-transformer/small-base',
# 'sshleifer/tiny-gpt2', # works, but requires setting pad_token manually
'allenai/longformer-base-4096',
'google/mobilebert-uncased',
# 'openai-gpt', # works, but requires setting pad_token manually
# 'google/reformer-enwik8', # TODO
'roberta-base',
'squeezebert/squeezebert-uncased',
'xlm-mlm-en-2048',
'xlm-roberta-base',
'xlnet-base-cased'
]
path = untar_data(URLs.IMDB_SAMPLE)
model_path = Path('models')
imdb_df = pd.read_csv(path/'texts.csv')
#hide_output
task = HF_TASKS_AUTO.SequenceClassification
bsz = 2
seq_sz = 128
test_results = []
for model_name in pretrained_model_names:
error=None
print(f'=== {model_name} ===\n')
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(model_name,
task=task,
config_kwargs={'num_labels': 2})
print(f'architecture:\t{hf_arch}\ntokenizer:\t{type(hf_tokenizer).__name__}\nmodel:\t\t{type(hf_model).__name__}\n')
blocks = (HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model, max_length=seq_sz, padding='max_length'),
CategoryBlock)
dblock = DataBlock(blocks=blocks,
get_x=ColReader('text'),
get_y=ColReader('label'),
splitter=ColSplitter(col='is_valid'))
dls = dblock.dataloaders(imdb_df, bs=bsz)
model = HF_BaseModelWrapper(hf_model)
learn = Learner(dls,
model,
opt_func=partial(Adam),
loss_func=CrossEntropyLossFlat(),
metrics=[accuracy],
cbs=[HF_BaseModelCallback],
splitter=hf_splitter).to_fp16()
learn.create_opt() # -> will create your layer groups based on your "splitter" function
learn.freeze()
b = dls.one_batch()
try:
print('*** TESTING DataLoaders ***')
test_eq(len(b), bsz)
test_eq(len(b[0]['input_ids']), bsz)
test_eq(b[0]['input_ids'].shape, torch.Size([bsz, seq_sz]))
test_eq(len(b[1]), bsz)
# print('*** TESTING One pass through the model ***')
# preds = learn.model(b[0])
# test_eq(len(preds[0]), bsz)
# test_eq(preds[0].shape, torch.Size([bsz, 2]))
print('*** TESTING Training/Results ***')
learn.fit_one_cycle(1, lr_max=1e-3)
test_results.append((hf_arch, type(hf_tokenizer).__name__, type(hf_model).__name__, 'PASSED', ''))
learn.show_results(learner=learn, max_n=2, trunc_at=250)
except Exception as err:
test_results.append((hf_arch, type(hf_tokenizer).__name__, type(hf_model).__name__, 'FAILED', err))
finally:
# cleanup
del learn; torch.cuda.empty_cache()