--- title: data.seq2seq.core keywords: fastai sidebar: home_sidebar summary: "This module contains the core seq2seq (e.g., language modeling, summarization, translation) bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data in a way modelable by huggingface transformer implementations." description: "This module contains the core seq2seq (e.g., language modeling, summarization, translation) bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data in a way modelable by huggingface transformer implementations." nb_path: "nbs/01za_data-seq2seq-core.ipynb" ---
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
pretrained_model_name = "facebook/bart-large-cnn"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name,
model_cls=BartForConditionalGeneration)
hf_arch, type(hf_config), type(hf_tokenizer), type(hf_model)
Seq2Seq tasks are essentially conditional generation tasks, this applies to specific derived tasks such as summarization and translation. Given this, we can use the same HF_Seq2Seq transforms, HF_Seq2SeqInput
, and HF_Seq2SeqBlock
for these tasks
We create a subclass of HF_BeforeBatchTransform
for summarization tasks to add decoder_input_ids
and labels
to our inputs during training, which will in turn allow the huggingface model to calculate the loss for us. See here and here for more information on these additional inputs used in summarization, translation, and conversational training tasks. How they should look for particular architectures can be found by looking at those model's forward
function's docs (See here for BART for example)
Note also that labels
is simply target_ids shifted to the right by one since the task to is to predict the next token based on the current (and all previous) decoder_input_ids
.
And lastly, we also update our targets to just be the input_ids
of our target sequence so that fastai's Learner.show_results
works (again, almost all the fastai bits require returning a single tensor to work).
default_text_gen_kwargs(hf_config, hf_model)
We include a new AFTER batch Transform
and TransformBlock
specific to text-2-text tasks.
... and a DataLoaders.show_batch
for seq2seq tasks