--- title: modeling.core keywords: fastai sidebar: home_sidebar summary: "This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your huggingface models trained via fastai" description: "This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your huggingface models trained via fastai" nb_path: "nbs/02_modeling-core.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Using GPU #1: GeForce GTX 1080 Ti
{% endraw %}

Base splitter, model wrapper, and model callback

{% raw %}
{% endraw %} {% raw %}

hf_splitter[source]

hf_splitter(m)

Splits the huggingface model based on various model architecture conventions

{% endraw %} {% raw %}

class HF_BaseModelWrapper[source]

HF_BaseModelWrapper(hf_model, output_hidden_states=False, output_attentions=False, hf_model_kwargs={}) :: Module

Same as nn.Module, but no need for subclasses to call super().__init__

{% endraw %} {% raw %}
{% endraw %}

Note that HF_BaseModelWrapper includes some nifty code for just passing in the things your model needs, as not all transformer architectures require/use the same information.

{% raw %}

class HF_PreCalculatedLoss[source]

HF_PreCalculatedLoss()

{% endraw %} {% raw %}
{% endraw %}

If you want to let your huggingface model calculate the loss for you, make sure you include the labels argument in your inputs and use HF_PreCalculatedLoss as your loss function. Even though we don't really need a loss function per se, we have to provide a custom loss class/function for fastai to function properly (e.g. one with a decodes and activation methods). Why? Because these methods will get called in methods like show_results to get the actual predictions.

{% raw %}

class HF_BaseModelCallback[source]

HF_BaseModelCallback(after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, before_step=None, after_cancel_step=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None) :: Callback

Basic class handling tweaks of the training loop by changing a Learner in various events

{% endraw %} {% raw %}
{% endraw %}

We use a Callback for handling what is returned from the huggingface model. The return type is (ModelOutput)[https://huggingface.co/transformers/main_classes/output.html#transformers.file_utils.ModelOutput] which makes it easy to return all the goodies we asked for.

Note that your Learner's loss will be set for you only if the huggingface model returns one and you are using the HF_PreCalculatedLoss loss function.

Also note that anything else you asked the model to return (for example, last hidden state, etc..) will be available for you via the blurr_model_outputs property attached to your Learner. For example, assuming you are using BERT for a classification task ... if you have told your HF_BaseModelWrapper instance to return attentions, you'd be able to access them via learn.blurr_model_outputs['attentions'].

Sequence classification

Below demonstrates how to setup your blurr pipeline for a sequence classification task (e.g., a model that requires a single text input)

{% raw %}
path = untar_data(URLs.IMDB_SAMPLE)
imdb_df = pd.read_csv(path/'texts.csv')
{% endraw %} {% raw %}
imdb_df.head()
label text is_valid
0 negative Un-bleeping-believable! Meg Ryan doesn't even look her usual pert lovable self in this, which normally makes me forgive her shallow ticky acting schtick. Hard to believe she was the producer on this dog. Plus Kevin Kline: what kind of suicide trip has his career been on? Whoosh... Banzai!!! Finally this was directed by the guy who did Big Chill? Must be a replay of Jonestown - hollywood style. Wooofff! False
1 positive This is a extremely well-made film. The acting, script and camera-work are all first-rate. The music is good, too, though it is mostly early in the film, when things are still relatively cheery. There are no really superstars in the cast, though several faces will be familiar. The entire cast does an excellent job with the script.<br /><br />But it is hard to watch, because there is no good end to a situation like the one presented. It is now fashionable to blame the British for setting Hindus and Muslims against each other, and then cruelly separating them into two countries. There is som... False
2 negative Every once in a long while a movie will come along that will be so awful that I feel compelled to warn people. If I labor all my days and I can save but one soul from watching this movie, how great will be my joy.<br /><br />Where to begin my discussion of pain. For starters, there was a musical montage every five minutes. There was no character development. Every character was a stereotype. We had swearing guy, fat guy who eats donuts, goofy foreign guy, etc. The script felt as if it were being written as the movie was being shot. The production value was so incredibly low that it felt li... False
3 positive Name just says it all. I watched this movie with my dad when it came out and having served in Korea he had great admiration for the man. The disappointing thing about this film is that it only concentrate on a short period of the man's life - interestingly enough the man's entire life would have made such an epic bio-pic that it is staggering to imagine the cost for production.<br /><br />Some posters elude to the flawed characteristics about the man, which are cheap shots. The theme of the movie "Duty, Honor, Country" are not just mere words blathered from the lips of a high-brassed offic... False
4 negative This movie succeeds at being one of the most unique movies you've seen. However this comes from the fact that you can't make heads or tails of this mess. It almost seems as a series of challenges set up to determine whether or not you are willing to walk out of the movie and give up the money you just paid. If you don't want to feel slighted you'll sit through this horrible film and develop a real sense of pity for the actors involved, they've all seen better days, but then you realize they actually got paid quite a bit of money to do this and you'll lose pity for them just like you've alr... False
{% endraw %} {% raw %}
model_cls = AutoModelForSequenceClassification

pretrained_model_name = "roberta-base" # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, model_cls=model_cls)
{% endraw %} {% raw %}
blocks = (HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model), CategoryBlock)
dblock = DataBlock(blocks=blocks, get_x=ColReader('text'), get_y=ColReader('label'), splitter=ColSplitter())
{% endraw %} {% raw %}
 
{% endraw %} {% raw %}
dls = dblock.dataloaders(imdb_df, bs=4)
{% endraw %} {% raw %}
dls.show_batch(dataloaders=dls, max_n=2)
text category
0 Raising Victor Vargas: A Review<br /><br />You know, Raising Victor Vargas is like sticking your hands into a big, steaming bowl of oatmeal. It's warm and gooey, but you're not sure if it feels right. Try as I might, no matter how warm and gooey Raising Victor Vargas became I was always aware that something didn't quite feel right. Victor Vargas suffers from a certain overconfidence on the director's part. Apparently, the director thought that the ethnic backdrop of a Latino family on the lower east side, and an idyllic storyline would make the film critic proof. He was right, but it didn't fool me. Raising Victor Vargas is the story about a seventeen-year old boy called, you guessed it, Victor Vargas (Victor Rasuk) who lives his teenage years chasing more skirt than the Rolling Stones could do in all the years they've toured. The movie starts off in `Ugly Fat' Donna's bedroom where Victor is sure to seduce her, but a cry from outside disrupts his plans when his best-friend Harold (Kevin Rivera) comes-a-looking for him. Caught in the attempt by Harold and his sister, Victor Vargas runs off for damage control. Yet even with the embarrassing implication that he's been boffing the homeliest girl in the neighborhood, nothing dissuades young Victor from going off on the hunt for more fresh meat. On a hot, New York City day they make way to the local public swimming pool where Victor's eyes catch a glimpse of the lovely young nymph Judy (Judy Marte), who's not just pretty, but a strong and independent too. The relationship that develops between Victor and Judy becomes the focus of the film. The story also focuses on Victor's family that is comprised of his grandmother or abuelita (Altagracia Guzman), his brother Nino (also played by real life brother to Victor, Silvestre Rasuk) and his sister Vicky (Krystal Rodriguez). The action follows Victor between scenes with Judy and scenes with his family. Victor tries to cope with being an oversexed pimp-daddy, his feelings for Judy and his grandmother's conservative Catholic upbringing.<br /><br />The problems that arise from Raising Victor Vargas are a few, but glaring errors. Throughout the film you get to know certain characters like Vicky, Nino, Grandma, negative
1 Many neglect that this isn't just a classic due to the fact that it's the first 3D game, or even the first shoot-'em-up. It's also one of the first stealth games, one of the only(and definitely the first) truly claustrophobic games, and just a pretty well-rounded gaming experience in general. With graphics that are terribly dated today, the game thrusts you into the role of B.J.(don't even *think* I'm going to attempt spelling his last name!), an American P.O.W. caught in an underground bunker. You fight and search your way through tunnels in order to achieve different objectives for the six episodes(but, let's face it, most of them are just an excuse to hand you a weapon, surround you with Nazis and send you out to waste one of the Nazi leaders). The graphics are, as I mentioned before, quite dated and very simple. The least detailed of basically any 3D game released by a professional team of creators. If you can get over that, however(and some would suggest that this simplicity only adds to the effect the game has on you), then you've got one heck of a good shooter/sneaking game. The game play consists of searching for keys, health and ammo, blasting enemies(aforementioned Nazis, and a "boss enemy" per chapter) of varying difficulty(which, of course, grows as you move further in the game), unlocking doors and looking for secret rooms. There is a bonus count after each level is beaten... it goes by how fast you were(basically, if you beat the 'par time', which is the time it took a tester to go through the same level; this can be quite fun to try and beat, and with how difficult the levels are to find your way in, they are even challenging after many play-throughs), how much Nazi gold(treasure) you collected and how many bad guys you killed. Basically, if you got 100% of any of aforementioned, you get a bonus, helping you reach the coveted high score placings. The game (mostly, but not always) allows for two contrastingly different methods of playing... stealthily or gunning down anything and everything you see. You can either run or walk, and amongst your weapons is also a knife... running is heard instantly the moment you enter the same room as the guard, as is gunshots. Many guards are found standing with their backs turned to you positive
{% endraw %}

Training

We'll also add in custom summary methods for blurr learners/models that work with dictionary inputs

{% raw %}
model = HF_BaseModelWrapper(hf_model)

learn = Learner(dls, 
                model,
                opt_func=partial(OptimWrapper, opt=torch.optim.Adam),
                loss_func=CrossEntropyLossFlat(),
                metrics=[accuracy],
                cbs=[HF_BaseModelCallback],
                splitter=hf_splitter)

learn.freeze()
{% endraw %}

.to_fp16() requires a GPU so had to remove for tests to run on github. Let's check that we can get predictions.

{% raw %}

blurr_module_summary[source]

blurr_module_summary(learn, *xb)

Print a summary of model using xb

{% endraw %} {% raw %}
{% endraw %} {% raw %}

Learner.blurr_summary[source]

Learner.blurr_summary()

Print a summary of the model, optimizer and loss function.

{% endraw %} {% raw %}
{% endraw %}

We have to create our own summary methods above because fastai only works where things are represented by a single tensor. But in the case of huggingface transformers, a single sequence is represented by multiple tensors (in a dictionary).

The change to make this work is so minor I think that the fastai library can/will hopefully be updated to support this use case.

{% raw %}
 
{% endraw %} {% raw %}
print(len(learn.opt.param_groups))
3
{% endraw %} {% raw %}
learn.lr_find(suggestions=True)
SuggestedLRs(lr_min=1.58489319801447e-07, lr_steep=0.05754399299621582)
{% endraw %} {% raw %}
learn.fit_one_cycle(1, lr_max=1e-3)
epoch train_loss valid_loss accuracy time
0 0.339034 0.309429 0.900000 00:21
{% endraw %}

Showing results

And here we creat a @typedispatched impelmentation of Learner.show_results.

{% raw %}
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text category target
0 The trouble with the book, "Memoirs of a Geisha" is that it had Japanese surfaces but underneath the surfaces it was all an American man's way of thinking. Reading the book is like watching a magnificent ballet with great music, sets, and costumes yet performed by barnyard animals dressed in those costumes—so far from Japanese ways of thinking were the characters.<br /><br />The movie isn't about Japan or real geisha. It is a story about a few American men's mistaken ideas about Japan and geish negative negative
1 <br /><br />I'm sure things didn't exactly go the same way in the real life of Homer Hickam as they did in the film adaptation of his book, Rocket Boys, but the movie "October Sky" (an anagram of the book's title) is good enough to stand alone. I have not read Hickam's memoirs, but I am still able to enjoy and understand their film adaptation. The film, directed by Joe Johnston and written by Lewis Colick, records the story of teenager Homer Hickam (Jake Gyllenhaal), beginning in October of 195 positive positive
{% endraw %} {% raw %}

Learner.blurr_predict[source]

Learner.blurr_predict(items, rm_type_tfms=None)

{% endraw %} {% raw %}
{% endraw %}

Same as with summary, we need to replace fastai's Learner.predict method with the one above which is able to work with inputs that are represented by multiple tensors included in a dictionary.

{% raw %}
learn.blurr_predict('I really liked the movie')
[(('positive',), (#1) [tensor(1)], (#1) [tensor([0.1088, 0.8912])])]
{% endraw %} {% raw %}
learn.blurr_predict(['I really liked the movie', 'I really hated the movie'])
[(('positive',), (#1) [tensor(1)], (#1) [tensor([0.1088, 0.8912])]),
 (('negative',), (#1) [tensor(0)], (#1) [tensor([0.7511, 0.2489])])]
{% endraw %} {% raw %}
learn.unfreeze()
{% endraw %} {% raw %}
learn.fit_one_cycle(3, lr_max=slice(1e-7, 1e-4))
epoch train_loss valid_loss accuracy time
0 0.251441 0.223398 0.925000 00:33
1 0.155990 0.281743 0.910000 00:33
2 0.095633 0.271915 0.900000 00:33
{% endraw %} {% raw %}
learn.recorder.plot_loss()
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text category target
0 The trouble with the book, "Memoirs of a Geisha" is that it had Japanese surfaces but underneath the surfaces it was all an American man's way of thinking. Reading the book is like watching a magnificent ballet with great music, sets, and costumes yet performed by barnyard animals dressed in those costumes—so far from Japanese ways of thinking were the characters.<br /><br />The movie isn't about Japan or real geisha. It is a story about a few American men's mistaken ideas about Japan and geish negative negative
1 <br /><br />I'm sure things didn't exactly go the same way in the real life of Homer Hickam as they did in the film adaptation of his book, Rocket Boys, but the movie "October Sky" (an anagram of the book's title) is good enough to stand alone. I have not read Hickam's memoirs, but I am still able to enjoy and understand their film adaptation. The film, directed by Joe Johnston and written by Lewis Colick, records the story of teenager Homer Hickam (Jake Gyllenhaal), beginning in October of 195 positive positive
{% endraw %} {% raw %}
learn.blurr_predict("This was a really good movie")
[(('positive',), (#1) [tensor(1)], (#1) [tensor([0.0785, 0.9215])])]
{% endraw %} {% raw %}
learn.blurr_predict("Acting was so bad it was almost funny.")
[(('negative',), (#1) [tensor(0)], (#1) [tensor([0.9484, 0.0516])])]
{% endraw %}

Inference

{% raw %}
export_fname = 'seq_class_learn_export'
{% endraw %}

Using fast.ai Learner.export and load_learner

{% raw %}
learn.export(fname=f'{export_fname}.pkl')
{% endraw %} {% raw %}
inf_learn = load_learner(fname=f'{export_fname}.pkl')
inf_learn.blurr_predict("This movie should not be seen by anyone!!!!")
[(('negative',), (#1) [tensor(0)], (#1) [tensor([0.9217, 0.0783])])]
{% endraw %}

Using ONNX - (TEMPORARILY UNAVAILABLE)

Much of the inspiration for the code below comes from Zach Mueller's excellent fastinference library, and in many places I simply adapted his code to work with blurr and the various huggingface transformers tasks.

{% raw %}
# import onnxruntime as ort
# from onnxruntime.quantization import quantize_dynamic, QuantType
{% endraw %} {% raw %}
# @patch
# def blurr_to_onnx(self:Learner, fname='export', path=None, quantize=False, excluded_input_names=[]):
#     """Export model to `ONNX` format"""
#     if (path == None): path = self.path
        
#     dummy_b = self.dls.one_batch()    
    
#     # inputs
#     for n in excluded_input_names:
#         if (n in dummy_b[0]): del dummy_b[0][n]
            
#     input_names = list(dummy_b[0].keys())
#     dynamic_axes = { n: {0:'batch_size', 1:'sequence'} for n in input_names if n in self.model.hf_model_fwd_args}
    
#     # outputs
#     output_names = [ f'output_{i}' for i in range(len(dummy_b) - self.dls.n_inp) ]
#     for n in output_names: dynamic_axes[n] = { 0:'batch_size' }
    
#     torch.onnx.export(model=self.model, 
#                       args=dummy_b[:self.dls.n_inp],    # everything but the targets
#                       f=self.path/f'{fname}.onnx',      # onnx filename
#                       opset_version=11,                 # required for get errors
#                       input_names=input_names,          # transformer dictionary keys for input
#                       output_names=output_names,        # one for each target
#                       dynamic_axes=dynamic_axes)        # see above
    
#     if (quantize):
#         quant_model_fpath = self.path/f'{fname}-quant.onnx'
#         quant_model = quantize_dynamic(self.path/f'{fname}.onnx', quant_model_fpath, weight_type=QuantType.QUInt8)

#     dls_export = self.dls.new_empty()
#     dls_export.loss_func = self.loss_func
#     dls_export.hf_model_fwd_args = self.model.hf_model_fwd_args # we need this to exclude non-model args in onnx
    
#     torch.save(dls_export, self.path/f'{fname}-dls.pkl', pickle_protocol=2)
{% endraw %} {% raw %}
# learn.blurr_to_onnx(export_fname, quantize=True)
{% endraw %} {% raw %}
# class blurrONNX():
#     def __init__(self, fname='export', path=Path('.'), use_quant_version=False): 
#         self.fname, self.path = fname, path
        
#         onnx_fname = f'{fname}-quant.onnx' if (use_quant_version) else f'{fname}.onnx'
#         self.ort_session = ort.InferenceSession(str(self.path/onnx_fname))
        
#         self.dls = torch.load(f'{self.path}/{fname}-dls.pkl')
#         self.trg_tfms = self.dls.tfms[self.dls.n_inp:]
#         self.tok_is_split_into_words = self.dls.before_batch[0].is_split_into_words
#         self.hf_model_fwd_args = self.dls.hf_model_fwd_args
        
#     def predict(self, items, rm_type_tfms=None):
#         is_split_str = self.tok_is_split_into_words and isinstance(items[0], str)
#         is_df = isinstance(items, pd.DataFrame)

#         if (not is_df and (is_split_str or not is_listy(items))): items = [items]
#         dl = self.dls.test_dl(items, rm_type_tfms=rm_type_tfms, num_workers=0)

#         outs = []
#         for b in dl:
#             xb = b[0]
#             inp = self._to_np(xb)
            
#             # remove any args not found in the transformers forward func
#             for k in list(inp.keys()):
#                 if (k not in self.hf_model_fwd_args): del inp[k]
                    
#             res = self.ort_session.run(None, inp)
#             tensor_res = [ tensor(r) for r in res ]
#             probs = L([ self.dls.loss_func.activation(tr) for tr in tensor_res ])
#             decoded_preds = L([ self.dls.loss_func.decodes(tr) for tr in tensor_res ])

#             for i in range(len(xb['input_ids'])):
#                 item_probs = probs.itemgot(i)
#                 item_dec_preds = decoded_preds.itemgot(i)
#                 item_dec_labels = tuplify([tfm.decode(item_dec_preds[tfm_idx]) 
#                                            for tfm_idx, tfm in enumerate(self.trg_tfms)])

#                 outs.append((item_dec_labels, item_dec_preds, item_probs))
            
#         return outs

#     #----- utility -----
#     def _to_np(self, xb): return { k: v.cpu().numpy() for k,v in xb.items() }
{% endraw %} {% raw %}
# onnx_inf = blurrONNX(export_fname)
{% endraw %} {% raw %}
# onnx_inf.predict(['I really liked the movie'])
{% endraw %} {% raw %}
# %timeit inf_learn.blurr_predict(['I really liked the movie', 'I hated everything in it'])
# %timeit onnx_inf.predict(['I really liked the movie', 'I hated everything in it'])
{% endraw %} {% raw %}
# onnx_inf = blurrONNX(export_fname, use_quant_version=True)
# onnx_inf.predict(['I hated everything in it'])
{% endraw %} {% raw %}
# %timeit inf_learn.blurr_predict(['I really liked the movie', 'I hated everything in it'])
# %timeit onnx_inf.predict(['I really liked the movie', 'I hated everything in it'])
{% endraw %}

Tests

The tests below to ensure the core training code above works for all pretrained sequence classification models available in huggingface. These tests are excluded from the CI workflow because of how long they would take to run and the amount of data that would be required to download.

Note: Feel free to modify the code below to test whatever pretrained classification models you are working with ... and if any of your pretrained sequence classification models fail, please submit a github issue (or a PR if you'd like to fix it yourself)

{% raw %}
try: del learn; torch.cuda.empty_cache()
except: pass
{% endraw %} {% raw %}
[ model_type for model_type in BLURR.get_models(task='SequenceClassification') 
 if (not model_type.__name__.startswith('TF')) ]
[transformers.models.albert.modeling_albert.AlbertForSequenceClassification,
 transformers.models.bart.modeling_bart.BartForSequenceClassification,
 transformers.models.bert.modeling_bert.BertForSequenceClassification,
 transformers.models.big_bird.modeling_big_bird.BigBirdForSequenceClassification,
 transformers.models.ctrl.modeling_ctrl.CTRLForSequenceClassification,
 transformers.models.camembert.modeling_camembert.CamembertForSequenceClassification,
 transformers.models.convbert.modeling_convbert.ConvBertForSequenceClassification,
 transformers.models.deberta.modeling_deberta.DebertaForSequenceClassification,
 transformers.models.deberta_v2.modeling_deberta_v2.DebertaV2ForSequenceClassification,
 transformers.models.distilbert.modeling_distilbert.DistilBertForSequenceClassification,
 transformers.models.electra.modeling_electra.ElectraForSequenceClassification,
 transformers.models.flaubert.modeling_flaubert.FlaubertForSequenceClassification,
 transformers.models.funnel.modeling_funnel.FunnelForSequenceClassification,
 transformers.models.gpt2.modeling_gpt2.GPT2ForSequenceClassification,
 transformers.models.ibert.modeling_ibert.IBertForSequenceClassification,
 transformers.models.led.modeling_led.LEDForSequenceClassification,
 transformers.models.layoutlm.modeling_layoutlm.LayoutLMForSequenceClassification,
 transformers.models.longformer.modeling_longformer.LongformerForSequenceClassification,
 transformers.models.mbart.modeling_mbart.MBartForSequenceClassification,
 transformers.models.mpnet.modeling_mpnet.MPNetForSequenceClassification,
 transformers.models.mobilebert.modeling_mobilebert.MobileBertForSequenceClassification,
 transformers.models.openai.modeling_openai.OpenAIGPTForSequenceClassification,
 transformers.models.reformer.modeling_reformer.ReformerForSequenceClassification,
 transformers.models.roberta.modeling_roberta.RobertaForSequenceClassification,
 transformers.models.squeezebert.modeling_squeezebert.SqueezeBertForSequenceClassification,
 transformers.models.tapas.modeling_tapas.TapasForSequenceClassification,
 transformers.models.transfo_xl.modeling_transfo_xl.TransfoXLForSequenceClassification,
 transformers.models.xlm.modeling_xlm.XLMForSequenceClassification,
 transformers.models.xlm_roberta.modeling_xlm_roberta.XLMRobertaForSequenceClassification,
 transformers.models.xlnet.modeling_xlnet.XLNetForSequenceClassification]
{% endraw %} {% raw %}
pretrained_model_names = [
    'albert-base-v1',
    'facebook/bart-base',
    'bert-base-uncased',
    'sshleifer/tiny-ctrl',
    'camembert-base',
    'microsoft/deberta-base',
    'distilbert-base-uncased',
    'monologg/electra-small-finetuned-imdb',
    'flaubert/flaubert_small_cased', 
    'huggingface/funnel-small-base',
    'gpt2',
    'allenai/led-base-16384',
    'allenai/longformer-base-4096',
    'sshleifer/tiny-mbart', 
    'microsoft/mpnet-base',
    'google/mobilebert-uncased',
    'openai-gpt',
    #'reformer-enwik8',                  (see model card; does not work with/require a tokenizer so no bueno here)
    'roberta-base',
    'squeezebert/squeezebert-uncased',
    #'google/tapas-base',                (requires pip install torch-scatter)
    'transfo-xl-wt103', 
    'xlm-mlm-en-2048',
    'xlm-roberta-base',
    'xlnet-base-cased'
]
{% endraw %} {% raw %}
path = untar_data(URLs.IMDB_SAMPLE)

model_path = Path('models')
imdb_df = pd.read_csv(path/'texts.csv')
{% endraw %} {% raw %}
#hide_output
model_cls = AutoModelForSequenceClassification
bsz = 2
seq_sz = 128

test_results = []
for model_name in pretrained_model_names:
    error=None
    
    print(f'=== {model_name} ===\n')
    
    hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(model_name, 
                                                                      model_cls=model_cls, 
                                                                      config_kwargs={'num_labels': 2})
    
    print(f'architecture:\t{hf_arch}\ntokenizer:\t{type(hf_tokenizer).__name__}\nmodel:\t\t{type(hf_model).__name__}\n')

    # not all architectures include a native pad_token (e.g., gpt2, ctrl, etc...), so we add one here
    if (hf_tokenizer.pad_token is None): 
        hf_tokenizer.add_special_tokens({'pad_token': '<pad>'})  
        hf_config.pad_token_id = hf_tokenizer.get_vocab()['<pad>']
        hf_model.resize_token_embeddings(len(hf_tokenizer))
                    
    blocks = (HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model, max_length=seq_sz, padding='max_length'), 
              CategoryBlock)

    dblock = DataBlock(blocks=blocks, 
                       get_x=ColReader('text'), 
                       get_y=ColReader('label'), 
                       splitter=ColSplitter(col='is_valid'))
    
    dls = dblock.dataloaders(imdb_df, bs=bsz)
    
    model = HF_BaseModelWrapper(hf_model)
    learn = Learner(dls, 
                    model,
                    opt_func=partial(Adam),
                    loss_func=CrossEntropyLossFlat(),
                    metrics=[accuracy],
                    cbs=[HF_BaseModelCallback],
                    splitter=hf_splitter).to_fp16()

    learn.create_opt()             # -> will create your layer groups based on your "splitter" function
    learn.freeze()
    
    b = dls.one_batch()
    
    try:
        print('*** TESTING DataLoaders ***')
        test_eq(len(b), bsz)
        test_eq(len(b[0]['input_ids']), bsz)
        test_eq(b[0]['input_ids'].shape, torch.Size([bsz, seq_sz]))
        test_eq(len(b[1]), bsz)

#         print('*** TESTING One pass through the model ***')
#         preds = learn.model(b[0])
#         test_eq(len(preds[0]), bsz)
#         test_eq(preds[0].shape, torch.Size([bsz, 2]))

        print('*** TESTING Training/Results ***')
        learn.fit_one_cycle(1, lr_max=1e-3)

        test_results.append((hf_arch, type(hf_tokenizer).__name__, type(hf_model).__name__, 'PASSED', ''))
        learn.show_results(learner=learn, max_n=2, trunc_at=250)
    except Exception as err:
        test_results.append((hf_arch, type(hf_tokenizer).__name__, type(hf_model).__name__, 'FAILED', err))
    finally:
        # cleanup
        del learn; torch.cuda.empty_cache()
{% endraw %} {% raw %}
arch tokenizer model result error
0 albert AlbertTokenizerFast AlbertForSequenceClassification PASSED
1 bart BartTokenizerFast BartForSequenceClassification PASSED
2 bert BertTokenizerFast BertForSequenceClassification PASSED
3 ctrl CTRLTokenizer CTRLForSequenceClassification PASSED
4 camembert CamembertTokenizerFast CamembertForSequenceClassification PASSED
5 deberta DebertaTokenizer DebertaForSequenceClassification PASSED
6 distilbert DistilBertTokenizerFast DistilBertForSequenceClassification PASSED
7 electra ElectraTokenizerFast ElectraForSequenceClassification PASSED
8 flaubert FlaubertTokenizer FlaubertForSequenceClassification PASSED
9 funnel FunnelTokenizerFast FunnelForSequenceClassification PASSED
10 gpt2 GPT2TokenizerFast GPT2ForSequenceClassification PASSED
11 led LEDTokenizerFast LEDForSequenceClassification FAILED You have to specify either decoder_input_ids or decoder_inputs_embeds
12 longformer LongformerTokenizerFast LongformerForSequenceClassification PASSED
13 mbart MBartTokenizerFast MBartForSequenceClassification PASSED
14 mpnet MPNetTokenizerFast MPNetForSequenceClassification PASSED
15 mobilebert MobileBertTokenizerFast MobileBertForSequenceClassification PASSED
16 openai OpenAIGPTTokenizerFast OpenAIGPTForSequenceClassification PASSED
17 roberta RobertaTokenizerFast RobertaForSequenceClassification PASSED
18 squeezebert SqueezeBertTokenizerFast SqueezeBertForSequenceClassification PASSED
19 transfo_xl TransfoXLTokenizer TransfoXLForSequenceClassification FAILED Expected object of scalar type Float but got scalar type Half for argument #4 'source' in call to _th_index_copy_
20 xlm XLMTokenizer XLMForSequenceClassification PASSED
21 xlm_roberta XLMRobertaTokenizerFast XLMRobertaForSequenceClassification PASSED
22 xlnet XLNetTokenizerFast XLNetForSequenceClassification PASSED
{% endraw %}

Cleanup