--- title: text.modeling.core keywords: fastai sidebar: home_sidebar summary: "This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your Hugging Face models trained via fastai" description: "This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your Hugging Face models trained via fastai" nb_path: "nbs/11_text-modeling-core.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
 
{% endraw %} {% raw %}
{% endraw %} {% raw %}
What we're running with at the time this documentation was generated:
torch: 1.10.1+cu111
fastai: 2.5.6
transformers: 4.16.2
{% endraw %}

Mid-level API

Base splitter, model wrapper, and model callback

{% raw %}
{% endraw %} {% raw %}

blurr_splitter[source]

blurr_splitter(m:Module)

Splits the Hugging Face model based on various model architecture conventions

{% endraw %} {% raw %}

class BaseModelWrapper[source]

BaseModelWrapper(hf_model:PreTrainedModel, output_hidden_states:bool=False, output_attentions:bool=False, hf_model_kwargs={}) :: Module

Same as nn.Module, but no need for subclasses to call super().__init__

Type Default Details
hf_model PreTrainedModel Your Hugging Face model
output_hidden_states bool False If True, hidden_states will be returned and accessed from Learner
output_attentions bool False If True, attentions will be returned and accessed from Learner
hf_model_kwargs dict None Any additional keyword arguments you want passed into your models forward method
{% endraw %} {% raw %}
{% endraw %}

Note that BaseModelWrapper includes some nifty code for just passing in the things your model needs, as not all transformer architectures require/use the same information.

{% raw %}

class BaseModelCallback[source]

BaseModelCallback(after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, before_step=None, after_cancel_step=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None) :: Callback

Basic class handling tweaks of the training loop by changing a Learner in various events

{% endraw %} {% raw %}
{% endraw %}

We use a Callback for handling the ModelOutput returned by Hugging Face transformers. It allows us to associate anything we want from that object to our Learner.

Note that your Learner's loss will be set for you only if the Hugging Face model returns one and you are using the PreCalculatedLoss loss function.

Also note that anything else you asked the model to return (for example, last hidden state, etc..) will be available for you via the blurr_model_outputs property attached to your Learner. For example, assuming you are using BERT for a classification task ... if you have told your BaseModelWrapper instance to return attentions, you'd be able to access them via learn.blurr_model_outputs['attentions'].

Example

Below demonstrates how to setup your pipeline for a sequence classification task (e.g., a model that requires a single text input) using the mid, high, and low-level API

{% raw %}
raw_datasets = load_dataset("imdb", split=["train", "test"])
raw_datasets[0] = raw_datasets[0].add_column("is_valid", [False] * len(raw_datasets[0]))
raw_datasets[1] = raw_datasets[1].add_column("is_valid", [True] * len(raw_datasets[1]))

final_ds = concatenate_datasets([raw_datasets[0].shuffle().select(range(1000)), raw_datasets[1].shuffle().select(range(200))])
imdb_df = pd.DataFrame(final_ds)
imdb_df.head()
Reusing dataset imdb (/home/wgilliam/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)
Loading cached shuffled indices for dataset at /home/wgilliam/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1/cache-d7fa5c9b748166d2.arrow
Loading cached shuffled indices for dataset at /home/wgilliam/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1/cache-c5f1241ea667d1dc.arrow
text label is_valid
0 Ah, the sex-and-gore movie. It's too bad they don't make these anymore (unless you live in Japan). But if they all turned out like this, that is not a bad thing.<br /><br />The movie basically consists of the two lovely vampires picking up "johns" along a country road, taking them home to their castle, having crazy sex with them, and then eating them (except the first victim, who they keep around for no particular reason). Things are complicated when a woman camping with her husband becomes too curious about these mysterious women she keeps seeing. It gets real ugly from here. By the end, ... 0 False
1 This is the film in Antonioni's middle period that most critics dismiss quickly, as a 'flawed' look at 60s American youth culture/politics. For what it's worth, I found it more touching and memorable than his more acclaimed films like L'AVVENTURA, perhaps because he shows more emotion & empathy here than anywhere else. The story is simple, but it is used as a frame for Antonioni's brilliant observations of, and critique on American consumerist culture, student life, the counter-culture, and the whole anti-establishment, anti-war backlash that was so prominent then. <br /><br />Even from a ... 1 False
2 This is the definitive movie version of Hamlet. Branagh cuts nothing, but there are no wasted moments. 1 False
3 i thought it was pretty interesting my social studies/language arts teacher was the police chief guy that was holding the microphone on the water barrel part =D i was excited my teacher is in some commercials he was in a gas/coffee/phone/play station commercial its nice seeing him on TV he was also on everybody hates Chris except he always get the small part la la why do we have to right 10 lines thats so stupid -_- i think I'm done never mind I'm still not done what is this a joke? why do we have to go all the way to line ten... really what's the point of it??!! i will just right random w... 1 False
4 What a dog of a movie. Noni Hazelhurst's performance is quite good, but it sits amidst a jungle of abhorrent scriptwriting, mediocre direction and wooden acting from the bulk of the cast. Many of the characters are woefully miscast, particularly the ever overrated Colin Friels.<br /><br />Very little works in this pretentious garbage. Much of the "character development" is done through a silly, angst-ridden voice over and frequently completely contradicts the behaviour of characters on-screen. In fact, it's hard to even figure out who the voice overs are talking about because they describe... 0 False
{% endraw %} {% raw %}
labels = raw_datasets[0].features["label"].names
labels
['neg', 'pos']
{% endraw %} {% raw %}
model_cls = AutoModelForSequenceClassification

pretrained_model_name = "distilroberta-base"  # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = NLP.get_hf_objects(pretrained_model_name, model_cls=model_cls)
{% endraw %} {% raw %}
set_seed()
blocks = (TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model, batch_tokenize_kwargs={"labels": labels}), CategoryBlock)
dblock = DataBlock(blocks=blocks, get_x=ColReader("text"), get_y=ColReader("label"), splitter=RandomSplitter(seed=42))
{% endraw %} {% raw %}
dls = dblock.dataloaders(imdb_df, bs=4)
{% endraw %} {% raw %}
dls.show_batch(dataloaders=dls, max_n=2, trunc_at=500)
text target
0 With Iphigenia, Mikhali Cacoyannis is perhaps the first film director to have successfully brought the feel of ancient Greek theatre to the screen. His own screenplay, an adaptation of Euripides' tragedy, was far from easy, compared to that of the other two films of the trilogy he directed. The story has been very carefully deconstructed from Euripides' version and placed in a logical, strictly chronological framework, better conforming to the modern methods of cinematic story-telling. Cacoyann pos
1 This comment does contain spoilers!!<br /><br />There are few actors that have an intangible to them. That innate quality which is an amalgamation of charisma, panache and swagger. It's the quality that can separate good actors from the truly great. I think George Clooney has it and so does Jack Nicholson. You can look at Clooney's subtle touches in scenes like his one word good-bye to Andy Garcia in Ocean's 11 when they just utter each other's name disdainfully. "Terry." "Danny." You can pick pos
{% endraw %}

Training

.to_fp16() requires a GPU so had to remove for tests to run on github. Let's check that we can get predictions.

{% raw %}
set_seed()

model = BaseModelWrapper(hf_model)

learn = Learner(
    dls,
    model,
    opt_func=partial(OptimWrapper, opt=torch.optim.Adam),
    loss_func=PreCalculatedCrossEntropyLoss(),  # CrossEntropyLossFlat(),
    metrics=[accuracy],
    cbs=[BaseModelCallback],
    splitter=blurr_splitter,
)

learn.freeze()
{% endraw %} {% raw %}
learn.summary()
{% endraw %} {% raw %}
print(len(learn.opt.param_groups))
3
{% endraw %} {% raw %}
learn.lr_find(suggest_funcs=[minimum, steep, valley, slide])
SuggestedLRs(minimum=8.317637839354575e-05, steep=0.033113110810518265, valley=0.00015848931798245758, slide=0.001737800776027143)
{% endraw %} {% raw %}
set_seed()
learn.fit_one_cycle(1, lr_max=1e-3)
epoch train_loss valid_loss accuracy time
0 0.325592 0.345017 0.862500 00:13
{% endraw %}

Showing results

And here we create a @typedispatched implementation of Learner.show_results.

{% raw %}
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text target prediction
0 Match 1: Tag Team Table Match Bubba Ray and Spike Dudley vs Eddie Guerrero and Chris Benoit Bubba Ray and Spike Dudley started things off with a Tag Team Table Match against Eddie Guerrero and Chris Benoit. According to the rules of the match, both opponents have to go through tables in order to get the win. Benoit and Guerrero heated up early on by taking turns hammering first Spike and then Bubba Ray. A German suplex by Benoit to Bubba took the wind out of the Dudley brother. Spike tried to h pos pos
1 THE SHOP AROUND THE CORNER is one of the sweetest and most feel-good romantic comedies ever made. There's just no getting around that, and it's hard to actually put one's feeling for this film into words. It's not one of those films that tries too hard, nor does it come up with the oddest possible scenarios to get the two protagonists together in the end. In fact, all its charm is innate, contained within the characters and the setting and the plot... which is highly believable to boot. It's ea pos pos
{% endraw %} {% raw %}
learn.unfreeze()
{% endraw %} {% raw %}
set_seed()
learn.fit_one_cycle(2, lr_max=slice(1e-7, 1e-4))
epoch train_loss valid_loss accuracy time
0 0.283300 0.308957 0.883333 00:20
1 0.199706 0.322771 0.862500 00:21
{% endraw %} {% raw %}
learn.recorder.plot_loss()
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text target prediction
0 Match 1: Tag Team Table Match Bubba Ray and Spike Dudley vs Eddie Guerrero and Chris Benoit Bubba Ray and Spike Dudley started things off with a Tag Team Table Match against Eddie Guerrero and Chris Benoit. According to the rules of the match, both opponents have to go through tables in order to get the win. Benoit and Guerrero heated up early on by taking turns hammering first Spike and then Bubba Ray. A German suplex by Benoit to Bubba took the wind out of the Dudley brother. Spike tried to h pos pos
1 Feroz Abbas Khan's Gandhi My Father, a film that sheds light on the fractured relationship between the Mahatma and his son Harilal Gandhi. For a story that's as dramatic as the one this film attempts to tell, it's a pity the director fails to tell it dramatically. Gandhi My Father is narrated to you like that boring history lesson that put you to sleep at school. Now the film aims to convey one very interesting point - the fact that Gandhi in his attempt to be a fair person, ended up being an u pos neg
{% endraw %}

Prediction

We need to replace fastai's Learner.predict method with the one above which is able to work with inputs that are represented by multiple tensors included in a dictionary.

{% raw %}
{% endraw %} {% raw %}

Learner.blurr_predict[source]

Learner.blurr_predict(items, rm_type_tfms=None)

{% endraw %} {% raw %}
learn.blurr_predict("I really liked the movie")
[{'label': 'pos',
  'score': 0.9268715381622314,
  'class_index': 1,
  'class_labels': ['neg', 'pos'],
  'probs': [0.07312848418951035, 0.9268715381622314]}]
{% endraw %} {% raw %}
learn.blurr_predict("Acting was so bad it was almost funny.")
[{'label': 'neg',
  'score': 0.951835036277771,
  'class_index': 0,
  'class_labels': ['neg', 'pos'],
  'probs': [0.951835036277771, 0.04816494137048721]}]
{% endraw %} {% raw %}
learn.blurr_predict(["I really liked the movie", "I really hated the movie"])
[{'label': 'pos',
  'score': 0.9268715977668762,
  'class_index': 1,
  'class_labels': ['neg', 'pos'],
  'probs': [0.07312841713428497, 0.9268715977668762]},
 {'label': 'neg',
  'score': 0.7611569762229919,
  'class_index': 0,
  'class_labels': ['neg', 'pos'],
  'probs': [0.7611569762229919, 0.23884305357933044]}]
{% endraw %}

Text generation

Though not useful in sequence classification, we will also add a blurr_generate method to Learner that uses Hugging Face's PreTrainedModel.generate for text generation tasks.

For the full list of arguments you can pass in see here. You can also check out their "How To Generate" notebook for more information about how it all works.

{% raw %}
{% endraw %} {% raw %}

Learner.blurr_generate[source]

Learner.blurr_generate(items, key='generated_texts', **kwargs)

Uses the built-in generate method to generate the text (see here for a list of arguments you can pass in)

{% endraw %}

Inference

Using fast.ai Learner.export and load_learner

{% raw %}
export_fname = "seq_class_learn_export"
{% endraw %} {% raw %}
learn.export(fname=f"{export_fname}.pkl")
{% endraw %} {% raw %}
inf_learn = load_learner(fname=f"{export_fname}.pkl")
inf_learn.blurr_predict("This movie should not be seen by anyone!!!!")
[{'label': 'neg',
  'score': 0.9319486618041992,
  'class_index': 0,
  'class_labels': ['neg', 'pos'],
  'probs': [0.9319486618041992, 0.0680513009428978]}]
{% endraw %}

High-level API

{% raw %}
model_cls = AutoModelForSequenceClassification

pretrained_model_name = "distilroberta-base"  # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = NLP.get_hf_objects(pretrained_model_name, model_cls=model_cls)

dls = dblock.dataloaders(imdb_df, bs=4)
{% endraw %} {% raw %}

class Blearner[source]

Blearner(dls:DataLoaders, hf_model:PreTrainedModel, base_model_cb:BaseModelCallback=BaseModelCallback, loss_func=None, opt_func=Adam, lr=0.001, splitter=trainable_params, cbs=None, metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85, 0.95)) :: Learner

Group together a model, some dls and a loss_func to handle training

Type Default Details
dls DataLoaders Your fastai DataLoaders
hf_model PreTrainedModel Your pretrained Hugging Face transformer
base_model_cb BaseModelCallback BaseModelCallback Your BaseModelCallback
kwargs No Content
{% endraw %} {% raw %}
{% endraw %}

Instead of constructing our low-level Learner, we can use the Blearner class which provides sensible defaults for training

{% raw %}
learn = Blearner(dls, hf_model, metrics=[accuracy])
{% endraw %} {% raw %}
learn.fit_one_cycle(1, lr_max=1e-3)
epoch train_loss valid_loss accuracy time
0 0.304205 0.334683 0.854167 00:13
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text target prediction
0 Match 1: Tag Team Table Match Bubba Ray and Spike Dudley vs Eddie Guerrero and Chris Benoit Bubba Ray and Spike Dudley started things off with a Tag Team Table Match against Eddie Guerrero and Chris Benoit. According to the rules of the match, both opponents have to go through tables in order to get the win. Benoit and Guerrero heated up early on by taking turns hammering first Spike and then Bubba Ray. A German suplex by Benoit to Bubba took the wind out of the Dudley brother. Spike tried to h pos pos
1 THE SHOP AROUND THE CORNER is one of the sweetest and most feel-good romantic comedies ever made. There's just no getting around that, and it's hard to actually put one's feeling for this film into words. It's not one of those films that tries too hard, nor does it come up with the oddest possible scenarios to get the two protagonists together in the end. In fact, all its charm is innate, contained within the characters and the setting and the plot... which is highly believable to boot. It's ea pos pos
{% endraw %} {% raw %}
learn.blurr_predict("This was a really good movie")
[{'label': 'pos',
  'score': 0.9181244969367981,
  'class_index': 1,
  'class_labels': ['neg', 'pos'],
  'probs': [0.08187545090913773, 0.9181244969367981]}]
{% endraw %} {% raw %}
learn.export(fname=f"{export_fname}.pkl")
inf_learn = load_learner(fname=f"{export_fname}.pkl")
inf_learn.blurr_predict("This movie should not be seen by anyone!!!!")
[{'label': 'neg',
  'score': 0.9200975894927979,
  'class_index': 0,
  'class_labels': ['neg', 'pos'],
  'probs': [0.9200975894927979, 0.07990235835313797]}]
{% endraw %} {% raw %}

class BlearnerForSequenceClassification[source]

BlearnerForSequenceClassification(dls:DataLoaders, hf_model:PreTrainedModel, base_model_cb:BaseModelCallback=BaseModelCallback, loss_func=None, opt_func=Adam, lr=0.001, splitter=trainable_params, cbs=None, metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85, 0.95)) :: Blearner

Group together a model, some dls and a loss_func to handle training

{% endraw %} {% raw %}
{% endraw %}

We also introduce a classification task specific Blearner that get you your DataBlock, DataLoaders, and BLearner in one line of code!

Examples

Using Mid-level API building blocks

{% raw %}
learn = BlearnerForSequenceClassification.from_data(
    imdb_df, "distilroberta-base", text_attr="text", label_attr="label", dl_kwargs={"bs": 4}
)
{% endraw %} {% raw %}
learn.fit_one_cycle(1, lr_max=1e-3)
epoch train_loss valid_loss f1_score accuracy time
0 0.277279 0.241745 0.932203 0.920000 00:14
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text target prediction
0 This is one of those films where it is easy to see how some people wouldn't like it. My wife has never seen it, and when I just rewatched it last night, I waited until after she went to bed. She might have been amused by a couple small snippets, but I know she would have had enough within ten minutes.<br /><br />Head has nothing like a conventional story. The film is firmly mired in the psychedelic era. It could be seen as filmic surrealism in a nutshell, or as something of a postmodern acid tr 1 0
1 The opening night for the 'South Asian International Film Festival' (SAIFF) in New York was an event a lot of us were waiting for.<br /><br />I would finally get to watch 'Hari Om' – I was tired of watching the "promo" on a loop and the lingering taste of the song Angel by Nitin Sawhney in the promo, left me begging to hear the rest of it. I was impressed by the visuals… and tremendously curious about how the rugged looking auto rickshaw driver would win the hearts of the stunning sophisticated 1 1
{% endraw %} {% raw %}
learn.predict("This was a really good movie")
[{'label': '1',
  'score': 0.886829137802124,
  'class_index': 1,
  'class_labels': [0, 1],
  'probs': [0.11317088454961777, 0.886829137802124]}]
{% endraw %} {% raw %}
learn.export(fname=f"{export_fname}.pkl")
inf_learn = load_learner(fname=f"{export_fname}.pkl")
inf_learn.blurr_predict("This movie should not be seen by anyone!!!!")
[{'label': '0',
  'score': 0.920820951461792,
  'class_index': 0,
  'class_labels': [0, 1],
  'probs': [0.920820951461792, 0.07917908579111099]}]
{% endraw %}

Using Low-level API building blocks

Thanks to the TextDataLoader, there isn't really anything you have to do to use plain ol' PyTorch or fast.ai Datasets and DataLoaders with Blurr. Let's take a look at fine-tuning a model against Glue's MRPC dataset ...

Build your Hugging Face objects
{% raw %}
model_cls = AutoModelForSequenceClassification

pretrained_model_name = "distilroberta-base"  # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = NLP.get_hf_objects(pretrained_model_name, model_cls=model_cls)
{% endraw %}
Preprocess your data
{% raw %}
from datasets import load_dataset
from blurr.text.data.core import preproc_hf_dataset

raw_datasets = load_dataset("glue", "mrpc")
Reusing dataset glue (/home/wgilliam/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
{% endraw %} {% raw %}
def tokenize_function(example):
    return hf_tokenizer(example["sentence1"], example["sentence2"], truncation=True)


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
Loading cached processed dataset at /home/wgilliam/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-f3774ba9358a732c.arrow
Loading cached processed dataset at /home/wgilliam/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-84263331ad583603.arrow
Loading cached processed dataset at /home/wgilliam/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-b7fe644c800de3c0.arrow
{% endraw %}
Build your DataLoaders
{% raw %}
label_names = raw_datasets["train"].features["label"].names

trn_dl = TextDataLoader(
    tokenized_datasets["train"],
    hf_arch=hf_arch,
    hf_config=hf_config,
    hf_tokenizer=hf_tokenizer,
    hf_model=hf_model,
    preproccesing_func=preproc_hf_dataset,
    batch_decode_kwargs={"labels": label_names},
    shuffle=True,
    batch_size=8,
)

val_dl = TextDataLoader(
    tokenized_datasets["validation"],
    hf_arch=hf_arch,
    hf_config=hf_config,
    hf_tokenizer=hf_tokenizer,
    hf_model=hf_model,
    preproccesing_func=preproc_hf_dataset,
    batch_decode_kwargs={"labels": label_names},
    batch_size=16,
)

dls = DataLoaders(trn_dl, val_dl)
{% endraw %}
Define your Blearner
{% raw %}
learn = BlearnerForSequenceClassification(dls, hf_model, loss_func=PreCalculatedCrossEntropyLoss())
{% endraw %}
Train
{% raw %}
learn.lr_find()
SuggestedLRs(valley=9.120108734350652e-05)
{% endraw %} {% raw %}
learn.fit_one_cycle(1, lr_max=1e-3)
epoch train_loss valid_loss time
0 0.509355 0.475181 00:13
{% endraw %} {% raw %}
learn.unfreeze()
learn.fit_one_cycle(2, lr_max=slice(1e-8, 1e-6))
epoch train_loss valid_loss time
0 0.509265 0.475472 00:26
1 0.484401 0.474938 00:26
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text target prediction
0 Spansion products are to be available from both AMD and Fujitsu, AMD said. Spansion Flash memory solutions are available worldwide from AMD and Fujitsu. equivalent equivalent
1 However, EPA officials would not confirm the 20 percent figure. Only in the past few weeks have officials settled on the 20 percent figure. not_equivalent equivalent
{% endraw %}

Tests

The tests below to ensure the core training code above works for all pretrained sequence classification models available in Hugging Face. These tests are excluded from the CI workflow because of how long they would take to run and the amount of data that would be required to download.

Note: Feel free to modify the code below to test whatever pretrained classification models you are working with ... and if any of your pretrained sequence classification models fail, please submit a github issue (or a PR if you'd like to fix it yourself)

{% raw %}
arch tokenizer model result error
0 albert AlbertTokenizerFast AlbertForSequenceClassification PASSED
1 bart BartTokenizerFast BartForSequenceClassification PASSED
2 bert BertTokenizerFast BertForSequenceClassification PASSED
3 big_bird BigBirdTokenizerFast BigBirdForSequenceClassification PASSED
4 bigbird_pegasus PegasusTokenizerFast BigBirdPegasusForSequenceClassification PASSED
5 ctrl CTRLTokenizer CTRLForSequenceClassification PASSED
6 camembert CamembertTokenizerFast CamembertForSequenceClassification PASSED
7 canine CanineTokenizer CanineForSequenceClassification PASSED
8 convbert ConvBertTokenizerFast ConvBertForSequenceClassification PASSED
9 deberta DebertaTokenizerFast DebertaForSequenceClassification FAILED mat1 and mat2 shapes cannot be multiplied (2x32 and 768x768)
10 deberta_v2 DebertaV2Tokenizer DebertaV2ForSequenceClassification PASSED
11 distilbert DistilBertTokenizerFast DistilBertForSequenceClassification PASSED
12 electra ElectraTokenizerFast ElectraForSequenceClassification PASSED
13 fnet FNetTokenizerFast FNetForSequenceClassification FAILED forward() got an unexpected keyword argument 'output_attentions'
14 flaubert FlaubertTokenizer FlaubertForSequenceClassification PASSED
15 funnel FunnelTokenizerFast FunnelForSequenceClassification PASSED
16 gpt2 GPT2TokenizerFast GPT2ForSequenceClassification PASSED
17 gptj GPT2TokenizerFast GPTJForSequenceClassification PASSED
18 gpt_neo GPT2TokenizerFast GPTNeoForSequenceClassification PASSED
19 ibert RobertaTokenizer IBertForSequenceClassification PASSED
20 led LEDTokenizerFast LEDForSequenceClassification PASSED
21 longformer LongformerTokenizerFast LongformerForSequenceClassification PASSED
22 mbart MBartTokenizerFast MBartForSequenceClassification PASSED
23 mpnet MPNetTokenizerFast MPNetForSequenceClassification PASSED
24 mobilebert MobileBertTokenizerFast MobileBertForSequenceClassification PASSED
25 openai OpenAIGPTTokenizerFast OpenAIGPTForSequenceClassification PASSED
26 reformer ReformerTokenizerFast ReformerForSequenceClassification FAILED If training, make sure that config.axial_pos_shape factors: (512, 1024) multiply to sequence length. Got prod((512, 1024)) != sequence_length: 32. You might want to consider padding your sequence length to 524288 or changing config.axial_pos_shape.
27 rembert RemBertTokenizerFast RemBertForSequenceClassification PASSED
28 roformer RoFormerTokenizerFast RoFormerForSequenceClassification PASSED
29 roberta RobertaTokenizerFast RobertaForSequenceClassification PASSED
30 squeezebert SqueezeBertTokenizerFast SqueezeBertForSequenceClassification PASSED
31 transfo_xl TransfoXLTokenizer TransfoXLForSequenceClassification PASSED
32 xlm XLMTokenizer XLMForSequenceClassification PASSED
33 xlm_roberta XLMRobertaTokenizerFast XLMRobertaForSequenceClassification PASSED
34 xlnet XLNetTokenizerFast XLNetForSequenceClassification PASSED
{% endraw %}