--- title: modeling.question_answering keywords: fastai sidebar: home_sidebar summary: "This module contains custom models, loss functions, custom splitters, etc... for question answering tasks" description: "This module contains custom models, loss functions, custom splitters, etc... for question answering tasks" nb_path: "nbs/02b_modeling-question-answering.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Using GPU #1: GeForce GTX 1080 Ti
{% endraw %}

Question Answer

Given a document (context) and a question, the objective of these models is to predict the start and end token of the correct answer as it exists in the context.

Again, we'll use a subset of pre-processed SQUAD v2 for our purposes below.

{% raw %}
# squad_df = pd.read_csv('./data/task-question-answering/squad_cleaned.csv'); len(squad_df)

# sample
squad_df = pd.read_csv('./squad_sample.csv'); len(squad_df)
1000
{% endraw %} {% raw %}
squad_df.head(2)
id title context question answers ds_type answer_text is_impossible
0 56be85543aeaaa14008c9063 Beyoncé Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five G... When did Beyonce start becoming popular? {'text': ['in the late 1990s'], 'answer_start': [269]} train in the late 1990s False
1 56be85543aeaaa14008c9065 Beyoncé Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five G... What areas did Beyonce compete in when she was growing up? {'text': ['singing and dancing'], 'answer_start': [207]} train singing and dancing False
{% endraw %} {% raw %}
pretrained_model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'
hf_model_cls = BertForQuestionAnswering

hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name,
                                                                               model_cls=hf_model_cls)

# # here's a pre-trained roberta model for squad you can try too
# pretrained_model_name = "ahotrod/roberta_large_squad2"
# hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name,
#                                                                                task=HF_TASKS_AUTO.ForQuestionAnswering)

# # here's a pre-trained xlm model for squad you can try too
# pretrained_model_name = 'xlm-mlm-ende-1024'
# hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name,
#                                                                                task=HF_TASKS_AUTO.ForQuestionAnswering)
{% endraw %} {% raw %}
squad_df = squad_df.apply(partial(pre_process_squad, hf_arch=hf_arch, hf_tokenizer=hf_tokenizer), axis=1)
{% endraw %} {% raw %}
max_seq_len= 128
{% endraw %} {% raw %}
squad_df = squad_df[(squad_df.tokenized_input_len < max_seq_len) & (squad_df.is_impossible == False)]
{% endraw %} {% raw %}
vocab = list(range(max_seq_len))
# vocab = dict(enumerate(range(max_seq_len)));
{% endraw %} {% raw %}
trunc_strat = 'only_second' if (hf_tokenizer.padding_side == 'right') else 'only_first'

before_batch_tfm = HF_QABeforeBatchTransform(hf_arch, hf_config, hf_tokenizer, hf_model,
                                             max_length=max_seq_len, 
                                             truncation=trunc_strat, 
                                             tok_kwargs={ 'return_special_tokens_mask': True })

blocks = (
    HF_TextBlock(before_batch_tfm=before_batch_tfm, input_return_type=HF_QuestionAnswerInput), 
    CategoryBlock(vocab=vocab),
    CategoryBlock(vocab=vocab)
)

def get_x(x):
    return (x.question, x.context) if (hf_tokenizer.padding_side == 'right') else (x.context, x.question)

dblock = DataBlock(blocks=blocks, 
                   get_x=get_x,
                   get_y=[ColReader('tok_answer_start'), ColReader('tok_answer_end')],
                   splitter=RandomSplitter(),
                   n_inp=1)
{% endraw %} {% raw %}
dls = dblock.dataloaders(squad_df, bs=4)
{% endraw %} {% raw %}
len(dls.vocab), dls.vocab[0], dls.vocab[1]
(2,
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127])
{% endraw %} {% raw %}
dls.show_batch(dataloaders=dls, max_n=2)
text start/end answer
0 what french magazine cover did the media criticize? in 2006, the animal rights organization people for the ethical treatment of animals ( peta ), criticized beyonce for wearing and using fur in her clothing line house of dereon. in 2011, she appeared on the cover of french fashion magazine l'officiel, in blackface and tribal makeup that drew criticism from the media. a statement released from a spokesperson for the magazine said that beyonce's look was " far from the glamorous sasha fierce " and that it was " a return to her african roots ". (59, 64) l'officiel
1 what people did chopin meet while in paris? in paris, chopin encountered artists and other distinguished figures, and found many opportunities to exercise his talents and achieve celebrity. during his years in paris he was to become acquainted with, among many others, hector berlioz, franz liszt, ferdinand hiller, heinrich heine, eugene delacroix, and alfred de vigny. chopin was also acquainted with the poet adam mickiewicz, principal of the polish literary society, some of whose verses he set as songs. (50, 78) hector berlioz, franz liszt, ferdinand hiller, heinrich heine, eugene delacroix, and alfred de vigny
{% endraw %}

Training

Here we create a question/answer specific subclass of HF_BaseModelCallback in order to get all the start and end prediction. We also add here a new loss function that can handle multiple targets

{% raw %}

class HF_QstAndAnsModelCallback[source]

HF_QstAndAnsModelCallback(after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, after_backward=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None) :: HF_BaseModelCallback

The prediction is a combination start/end logits

{% endraw %} {% raw %}
{% endraw %}

And here we provide a custom loss function our question answer task, expanding on some techniques learned from here and here.

In fact, this new loss function can be used in many other multi-modal architectures, with any mix of loss functions. For example, this can be ammended to include the is_impossible task, as well as the start/end token tasks in the SQUAD v2 dataset.

{% raw %}

class MultiTargetLoss[source]

MultiTargetLoss(loss_classes=[<class 'fastai.losses.CrossEntropyLossFlat'>, <class 'fastai.losses.CrossEntropyLossFlat'>], loss_classes_kwargs=[{}, {}], weights=[1, 1], reduction='mean') :: Module

Provides the ability to apply different loss functions to multi-modal targets/predictions

{% endraw %} {% raw %}
{% endraw %} {% raw %}
model = HF_BaseModelWrapper(hf_model)

learn = Learner(dls, 
                model,
                opt_func=partial(Adam, decouple_wd=True),
                cbs=[HF_QstAndAnsModelCallback],
                splitter=hf_splitter)

learn.loss_func=MultiTargetLoss()
learn.create_opt()                # -> will create your layer groups based on your "splitter" function
learn.freeze()
{% endraw %}

Notice above how I had to define the loss function after creating the Learner object. I'm not sure why, but the MultiTargetLoss above prohibits the learner from being exported if I do.

{% raw %}
 
{% endraw %} {% raw %}
print(len(learn.opt.param_groups))
3
{% endraw %} {% raw %}
x, y_start, y_end = dls.one_batch()
preds = learn.model(x)
len(preds),preds[0].shape
(2, torch.Size([4, 127]))
{% endraw %} {% raw %}
learn.lr_find(suggestions=True)
SuggestedLRs(lr_min=0.0033113110810518267, lr_steep=0.0014454397605732083)
{% endraw %} {% raw %}
learn.fit_one_cycle(3, lr_max=1e-3)
epoch train_loss valid_loss time
0 4.269620 1.945376 00:04
1 2.341031 1.440929 00:04
2 1.587859 1.385940 00:04
{% endraw %}

Showing results

Below we'll add in additional functionality to more intuitively show the results of our model.

{% raw %}
{% endraw %} {% raw %}
learn.show_results(learner=learn, skip_special_tokens=True, max_n=2, trunc_at=500)
text start/end answer pred start/end pred answer
0 during what month did frederic move to warsaw with his family? in october 1810, six months after fryderyk's birth, the family moved to warsaw, where his father acquired a post teaching french at the warsaw lyceum, then housed in the saxon palace. fryderyk lived with his family in the palace grounds. the father played the flute and violin ; the mother played the piano and gave lessons to boys in the boarding house that the chopins kept. chopin was of slight build, and even in early childhood was (15, 16) october (15, 17) october 1810
1 where did chopin create most of his works? in his native poland, in france, where he composed most of his works, and beyond, chopin's music, his status as one of music's earliest superstars, his association ( if only indirect ) with political insurrection, his love life and his early death have made him, in the public consciousness, a leading symbol of the romantic era. his works remain popular, and he has been the subject of numerous films and biographies of varying degrees of historical accura (17, 18) france (17, 18) france
{% endraw %}

... and lets see how Learner.blurr_predict works with question/answering tasks

{% raw %}
inf_df = pd.DataFrame.from_dict([{
    'question': 'What did George Lucas make?',
    'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'   
}], 
    orient='columns')

learn.blurr_predict(inf_df.iloc[0])
[(('11', '13'),
  (#2) [tensor(11),tensor(13)],
  (#2) [tensor([8.4690e-08, 2.3582e-08, 1.7349e-09, 4.6443e-09, 2.8531e-09, 1.9948e-09,
        2.6099e-10, 8.4697e-08, 4.3354e-04, 1.7619e-05, 7.9166e-04, 9.9864e-01,
        1.0210e-04, 2.2261e-07, 6.6072e-06, 2.5145e-07, 4.3285e-06, 3.6742e-06,
        1.2082e-08, 1.4640e-06, 4.6735e-07, 6.1825e-08, 2.1721e-07]),tensor([1.5560e-03, 5.3295e-05, 4.0664e-06, 1.4171e-06, 7.0940e-06, 3.4613e-06,
        2.0923e-05, 1.5561e-03, 1.9432e-05, 1.0406e-04, 5.1822e-05, 1.1023e-05,
        6.6443e-02, 4.8986e-01, 3.0896e-02, 3.8015e-01, 1.3186e-04, 4.0736e-05,
        1.5066e-04, 1.2086e-04, 5.7185e-03, 2.1867e-02, 1.2406e-03])])]
{% endraw %} {% raw %}
inf_df = pd.DataFrame.from_dict([
    {
        'question': 'What did George Lucas make?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'   
    }, {
        'question': 'What year did Star Wars come out?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }, {
        'question': 'What did George Lucas do?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }], 
    orient='columns')

learn.blurr_predict(inf_df)
[(('11', '13'),
  (#2) [tensor(11),tensor(13)],
  (#2) [tensor([8.4690e-08, 2.3582e-08, 1.7349e-09, 4.6443e-09, 2.8530e-09, 1.9948e-09,
        2.6099e-10, 8.4697e-08, 4.3354e-04, 1.7619e-05, 7.9166e-04, 9.9864e-01,
        1.0210e-04, 2.2261e-07, 6.6072e-06, 2.5145e-07, 4.3285e-06, 3.6742e-06,
        1.2082e-08, 1.4640e-06, 4.6734e-07, 6.1824e-08, 2.1721e-07, 1.6226e-10,
        2.1840e-10]),tensor([1.5560e-03, 5.3295e-05, 4.0664e-06, 1.4171e-06, 7.0940e-06, 3.4613e-06,
        2.0923e-05, 1.5561e-03, 1.9432e-05, 1.0406e-04, 5.1822e-05, 1.1023e-05,
        6.6443e-02, 4.8986e-01, 3.0895e-02, 3.8015e-01, 1.3186e-04, 4.0736e-05,
        1.5066e-04, 1.2086e-04, 5.7185e-03, 2.1867e-02, 1.2406e-03, 1.0649e-06,
        6.5562e-07])]),
 (('16', '17'),
  (#2) [tensor(16),tensor(17)],
  (#2) [tensor([5.2048e-07, 1.7044e-06, 2.5959e-08, 1.7254e-08, 1.3739e-08, 9.8209e-09,
        1.4993e-08, 1.7730e-08, 7.7954e-09, 5.2054e-07, 7.1842e-07, 9.3201e-07,
        1.5671e-06, 3.1975e-06, 1.2552e-06, 3.8337e-05, 9.9995e-01, 1.0162e-06,
        1.3380e-07, 1.1818e-07, 2.2754e-08, 1.2847e-07, 3.3104e-07, 4.5823e-07,
        5.1939e-07]),tensor([1.8849e-03, 3.5101e-04, 3.1385e-04, 1.0347e-04, 3.7103e-05, 7.4453e-05,
        4.4840e-05, 8.6190e-05, 2.1104e-04, 1.8849e-03, 3.5103e-04, 5.1403e-04,
        3.4193e-04, 1.4921e-04, 5.9495e-04, 5.8128e-04, 4.9680e-03, 9.7634e-01,
        5.4388e-03, 4.0422e-04, 4.4967e-04, 3.5914e-04, 1.2041e-03, 1.4260e-03,
        1.8832e-03])]),
 (('17', '21'),
  (#2) [tensor(17),tensor(21)],
  (#2) [tensor([3.0938e-06, 8.7476e-08, 2.1809e-08, 6.6708e-08, 2.3678e-08, 2.2183e-08,
        6.3020e-09, 3.0945e-06, 8.1029e-03, 6.4302e-04, 1.8530e-01, 3.5721e-04,
        2.4048e-05, 1.9852e-06, 3.5058e-05, 8.3339e-06, 1.1303e-01, 6.7239e-01,
        1.3703e-05, 2.0050e-02, 2.1990e-05, 7.5103e-06, 2.2600e-06, 2.7818e-09,
        4.1533e-09]),tensor([2.6103e-03, 9.3177e-06, 6.9879e-06, 2.4205e-06, 1.4388e-05, 6.2576e-06,
        2.0180e-05, 2.6105e-03, 1.8758e-05, 1.6505e-04, 1.7792e-04, 2.0651e-04,
        1.5630e-02, 2.3282e-02, 7.6168e-03, 7.0506e-02, 1.9699e-05, 3.3141e-05,
        7.9334e-04, 2.8837e-03, 2.7674e-01, 5.9428e-01, 2.3701e-03, 2.0888e-06,
        9.7243e-07])])]
{% endraw %} {% raw %}
inp_ids = hf_tokenizer.encode('What did George Lucas make?',
                              'George Lucas created Star Wars in 1977. He directed and produced it.')

hf_tokenizer.convert_ids_to_tokens(inp_ids, skip_special_tokens=False)[11:13]
['star', 'wars']
{% endraw %}

Note that there is a bug currently in fastai v2 (or with how I'm assembling everything) that currently prevents us from seeing the decoded predictions and probabilities for the "end" token.

{% raw %}
inf_df = pd.DataFrame.from_dict([{
    'question': 'When was Star Wars made?',
    'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'
}], 
    orient='columns')

test_dl = dls.test_dl(inf_df)
inp = test_dl.one_batch()[0]['input_ids']
probs, _, preds = learn.get_preds(dl=test_dl, with_input=False, with_decoded=True)
{% endraw %} {% raw %}
hf_tokenizer.convert_ids_to_tokens(inp.tolist()[0], 
                                   skip_special_tokens=False)[torch.argmax(probs[0]):torch.argmax(probs[1])]
['1977']
{% endraw %}

We can unfreeze and continue training like normal

{% raw %}
learn.unfreeze()
{% endraw %} {% raw %}
learn.fit_one_cycle(3, lr_max=slice(1e-7, 1e-4))
epoch train_loss valid_loss time
0 0.793695 1.306132 00:08
1 0.633532 1.334760 00:08
2 0.493486 1.323230 00:08
{% endraw %} {% raw %}
learn.recorder.plot_loss()
{% endraw %} {% raw %}
learn.show_results(learner=learn, max_n=2, trunc_at=100)
text start/end answer pred start/end pred answer
0 during what month did frederic move to warsaw with his family? in october 1810, six months after fry (15, 16) october (15, 17) october 1810
1 when did beyonce sign a letter for one campaign? in 2015 beyonce signed an open letter which the one (13, 14) 2015 (13, 14) 2015
{% endraw %} {% raw %}
learn.blurr_predict(inf_df.iloc[0])
[(('14', '15'),
  (#2) [tensor(14),tensor(15)],
  (#2) [tensor([5.0287e-08, 1.9912e-08, 2.9835e-09, 2.0213e-09, 1.8854e-09, 5.7235e-09,
        8.2177e-10, 5.0290e-08, 8.6826e-07, 3.0716e-07, 2.9765e-06, 2.4703e-06,
        4.1488e-07, 3.9671e-04, 9.9960e-01, 3.3189e-07, 2.6552e-08, 1.2597e-08,
        1.5300e-09, 9.4568e-09, 2.4536e-08, 1.8006e-08, 4.9007e-08]),tensor([2.3329e-04, 8.3509e-06, 3.4587e-06, 1.3708e-06, 2.4248e-06, 1.3938e-06,
        6.1308e-06, 2.3329e-04, 1.7713e-05, 2.0598e-05, 2.8978e-05, 4.7946e-06,
        3.2635e-05, 8.1082e-05, 6.2376e-04, 9.9566e-01, 2.6639e-03, 1.5524e-05,
        1.3700e-05, 1.0717e-05, 5.1943e-05, 5.5223e-05, 2.2974e-04])])]
{% endraw %} {% raw %}
preds, pred_classes, probs = zip(*learn.blurr_predict(inf_df.iloc[0]))
preds
(('14', '15'),)
{% endraw %} {% raw %}
inp_ids = hf_tokenizer.encode('When was Star Wars made?',
                              'George Lucas created Star Wars in 1977. He directed and produced it.')

hf_tokenizer.convert_ids_to_tokens(inp_ids, skip_special_tokens=False)[int(preds[0][0]):int(preds[0][1])]
['1977']
{% endraw %}

Inference

Note that I had to replace the loss function because of the above-mentioned issue to exporting the model with the MultiTargetLoss loss function. After getting our inference learner, we put it back and we're good to go!

{% raw %}
export_name = 'q_and_a_learn_export'
{% endraw %} {% raw %}
learn.loss_func = CrossEntropyLossFlat()
learn.export(fname=f'{export_name}.pkl')
{% endraw %} {% raw %}
inf_learn = load_learner(fname=f'{export_name}.pkl')
inf_learn.loss_func = MultiTargetLoss()

inf_df = pd.DataFrame.from_dict([
    {
        'question': 'What did George Lucas make?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'   
    }, {
        'question': 'What year did Star Wars come out?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }, {
        'question': 'What did George Lucas do?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }], 
    orient='columns')

inf_learn.blurr_predict(inf_df)
[(('11', '13'),
  (#2) [tensor(11),tensor(13)],
  (#2) [tensor([6.4462e-08, 1.9989e-08, 1.3325e-09, 2.7664e-09, 1.8671e-09, 1.7789e-09,
        1.8039e-10, 6.4467e-08, 1.0510e-04, 5.5178e-06, 3.4711e-04, 9.9945e-01,
        8.3229e-05, 6.0367e-08, 2.7122e-06, 6.8480e-08, 1.0021e-06, 1.1485e-06,
        4.9145e-09, 5.5146e-07, 1.1851e-07, 1.7658e-08, 5.5673e-08, 1.3025e-10,
        1.8010e-10]),tensor([4.5258e-04, 1.7297e-05, 1.2920e-06, 4.1197e-07, 1.3324e-06, 7.0893e-07,
        4.6975e-06, 4.5261e-04, 4.9904e-06, 1.8728e-05, 1.2564e-05, 2.8013e-06,
        2.1285e-02, 7.5174e-01, 1.0028e-02, 2.1036e-01, 3.8814e-05, 1.2367e-05,
        4.0521e-05, 2.6420e-05, 1.3587e-03, 3.7221e-03, 4.1871e-04, 2.7288e-07,
        1.8427e-07])]),
 (('16', '17'),
  (#2) [tensor(16),tensor(17)],
  (#2) [tensor([2.4833e-07, 6.7453e-07, 1.0828e-08, 8.1945e-09, 6.2059e-09, 4.5433e-09,
        6.5401e-09, 7.9235e-09, 3.6031e-09, 2.4835e-07, 1.8276e-07, 2.5371e-07,
        4.3557e-07, 9.3449e-07, 4.0939e-07, 1.4522e-05, 9.9998e-01, 2.2884e-07,
        3.5751e-08, 3.2133e-08, 8.3219e-09, 3.5408e-08, 8.8465e-08, 1.1490e-07,
        2.4797e-07]),tensor([4.9447e-04, 9.1783e-05, 7.3632e-05, 2.5121e-05, 9.2703e-06, 1.6544e-05,
        1.0471e-05, 1.7209e-05, 3.6782e-05, 4.9447e-04, 7.6062e-05, 1.0730e-04,
        8.0847e-05, 3.5005e-05, 1.2638e-04, 1.1738e-04, 1.2393e-03, 9.9440e-01,
        1.4980e-03, 8.0291e-05, 7.3319e-05, 6.2824e-05, 1.7782e-04, 1.6095e-04,
        4.9452e-04])]),
 (('17', '21'),
  (#2) [tensor(17),tensor(21)],
  (#2) [tensor([3.0041e-06, 9.5966e-08, 2.4197e-08, 6.4676e-08, 2.4129e-08, 2.6274e-08,
        6.8308e-09, 3.0048e-06, 2.9561e-03, 2.8578e-04, 1.4297e-01, 1.3375e-04,
        1.4583e-05, 4.9956e-07, 2.3044e-05, 3.7895e-06, 8.8909e-02, 7.2961e-01,
        8.9323e-06, 3.5066e-02, 1.2038e-05, 4.0091e-06, 2.8451e-06, 3.1645e-09,
        4.7594e-09]),tensor([1.1119e-03, 3.7625e-06, 3.0416e-06, 9.4822e-07, 3.7288e-06, 2.2327e-06,
        6.6669e-06, 1.1120e-03, 6.0009e-06, 4.0004e-05, 6.9832e-05, 1.0008e-04,
        5.6385e-03, 1.7722e-02, 2.1420e-03, 3.3728e-02, 7.0482e-06, 1.2536e-05,
        4.7033e-04, 6.8300e-04, 2.7717e-01, 6.5890e-01, 1.0704e-03, 8.3671e-07,
        4.3051e-07])])]
{% endraw %} {% raw %}
inp_ids = hf_tokenizer.encode('What did George Lucas make?',
                              'George Lucas created Star Wars in 1977. He directed and produced it.')

hf_tokenizer.convert_ids_to_tokens(inp_ids, skip_special_tokens=False)[11:13]
['star', 'wars']
{% endraw %}

... and onnx works here too

{% raw %}
learn.blurr_to_onnx(export_name)
/home/wgilliam/anaconda3/envs/blurr/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py:192: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  position_ids = self.position_ids[:, :seq_length]
/home/wgilliam/anaconda3/envs/blurr/lib/python3.7/site-packages/transformers/modeling_utils.py:1757: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  input_tensor.shape[chunk_dim] == tensor_shape for input_tensor in input_tensors
/home/wgilliam/anaconda3/envs/blurr/lib/python3.7/site-packages/torch/onnx/utils.py:244: UserWarning: We detected that you are modifying a dictionnary that is an input to your model. Note that dictionaries are allowed as inputs in ONNX but they should be handled with care. Usages of dictionaries is not recommended, and should not be used except for configuration use. Also note that the order and values of the keys must remain the same. 
  warnings.warn(warning)
{% endraw %} {% raw %}
onnx_inf = blurrONNX(export_name)
{% endraw %} {% raw %}
onnx_inf.predict(inf_df)
[(('11', '13'),
  (#2) [tensor(11),tensor(13)],
  (#2) [tensor([6.4462e-08, 1.9989e-08, 1.3325e-09, 2.7664e-09, 1.8671e-09, 1.7790e-09,
        1.8039e-10, 6.4467e-08, 1.0510e-04, 5.5178e-06, 3.4711e-04, 9.9945e-01,
        8.3229e-05, 6.0367e-08, 2.7122e-06, 6.8480e-08, 1.0021e-06, 1.1485e-06,
        4.9146e-09, 5.5146e-07, 1.1851e-07, 1.7658e-08, 5.5674e-08, 1.3025e-10,
        1.8010e-10]),tensor([4.5258e-04, 1.7297e-05, 1.2920e-06, 4.1197e-07, 1.3324e-06, 7.0893e-07,
        4.6975e-06, 4.5261e-04, 4.9905e-06, 1.8728e-05, 1.2564e-05, 2.8013e-06,
        2.1285e-02, 7.5174e-01, 1.0028e-02, 2.1036e-01, 3.8814e-05, 1.2367e-05,
        4.0521e-05, 2.6421e-05, 1.3587e-03, 3.7222e-03, 4.1871e-04, 2.7288e-07,
        1.8427e-07])]),
 (('16', '17'),
  (#2) [tensor(16),tensor(17)],
  (#2) [tensor([2.4833e-07, 6.7453e-07, 1.0828e-08, 8.1945e-09, 6.2059e-09, 4.5433e-09,
        6.5401e-09, 7.9235e-09, 3.6031e-09, 2.4835e-07, 1.8276e-07, 2.5371e-07,
        4.3557e-07, 9.3449e-07, 4.0939e-07, 1.4522e-05, 9.9998e-01, 2.2884e-07,
        3.5751e-08, 3.2133e-08, 8.3219e-09, 3.5408e-08, 8.8465e-08, 1.1490e-07,
        2.4797e-07]),tensor([4.9447e-04, 9.1783e-05, 7.3632e-05, 2.5121e-05, 9.2702e-06, 1.6544e-05,
        1.0471e-05, 1.7209e-05, 3.6782e-05, 4.9447e-04, 7.6062e-05, 1.0730e-04,
        8.0846e-05, 3.5005e-05, 1.2638e-04, 1.1738e-04, 1.2393e-03, 9.9440e-01,
        1.4980e-03, 8.0291e-05, 7.3319e-05, 6.2823e-05, 1.7782e-04, 1.6094e-04,
        4.9452e-04])]),
 (('17', '21'),
  (#2) [tensor(17),tensor(21)],
  (#2) [tensor([3.0041e-06, 9.5966e-08, 2.4196e-08, 6.4676e-08, 2.4129e-08, 2.6274e-08,
        6.8308e-09, 3.0048e-06, 2.9561e-03, 2.8578e-04, 1.4297e-01, 1.3375e-04,
        1.4583e-05, 4.9956e-07, 2.3044e-05, 3.7895e-06, 8.8909e-02, 7.2961e-01,
        8.9323e-06, 3.5066e-02, 1.2038e-05, 4.0091e-06, 2.8450e-06, 3.1645e-09,
        4.7593e-09]),tensor([1.1119e-03, 3.7625e-06, 3.0416e-06, 9.4822e-07, 3.7288e-06, 2.2327e-06,
        6.6669e-06, 1.1120e-03, 6.0009e-06, 4.0004e-05, 6.9832e-05, 1.0008e-04,
        5.6385e-03, 1.7722e-02, 2.1420e-03, 3.3728e-02, 7.0483e-06, 1.2536e-05,
        4.7033e-04, 6.8301e-04, 2.7717e-01, 6.5890e-01, 1.0704e-03, 8.3671e-07,
        4.3051e-07])])]
{% endraw %}

Cleanup