--- title: data.text2text.core keywords: fastai sidebar: home_sidebar summary: "This module contains the core text2text (e.g., language modeling, summarization, translation) bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data in a way modelable by huggingface transformer implementations." description: "This module contains the core text2text (e.g., language modeling, summarization, translation) bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data in a way modelable by huggingface transformer implementations." nb_path: "nbs/01za_data-text2text-core.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Using GPU #1: GeForce GTX 1080 Ti
{% endraw %}

Base tokenization, batch transform, and DataBlock methods

{% raw %}

class HF_Text2TextAfterBatchTransform[source]

HF_Text2TextAfterBatchTransform(hf_tokenizer, input_return_type=HF_BaseInput) :: HF_AfterBatchTransform

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches

{% endraw %} {% raw %}

class HF_Text2TextBlock[source]

HF_Text2TextBlock(hf_arch=None, hf_tokenizer=None, before_batch_tfms=None, after_batch_tfms=None, max_length=None, padding=True, truncation=True, is_split_into_words=False, n_tok_inps=1, tok_kwargs={}, input_return_type=HF_BaseInput, dl_type=SortedDL, before_batch_kwargs={}, after_batch_kwargs={}, **kwargs) :: HF_TextBlock

A basic wrapper that links defaults transforms for the data block API

{% endraw %} {% raw %}
{% endraw %}

We include a new batch Transform and TransformBlock specific to text-2-text tasks.

Cleanup