--- title: blurr keywords: fastai sidebar: home_sidebar summary: "A library designed for fastai developers who want to train and deploy Hugging Face transformers" description: "A library designed for fastai developers who want to train and deploy Hugging Face transformers" nb_path: "nbs/index.ipynb" ---
Named after the fastest transformer (well, at least of the Autobots), BLURR provides both a comprehensive and extensible framework for training and deploying 🤗 huggingface transformer models with fastai >= 2.0.
Utilizing features like fastai's new @typedispatch
and @patch
decorators, along with a simple class hiearchy, BLURR provides fastai developers with the ability to train and deploy transformers on a variety of tasks. It includes a high, mid, and low-level API that will allow developers to use much of it out-of-the-box or customize it as needed.
Supported Text/NLP Tasks:
Supported Vision Tasks:
Supported Audio Tasks:
You can now pip install blurr via pip install ohmeow-blurr
Or, even better as this library is under very active development, create an editable install like this:
git clone https://github.com/ohmeow/blurr.git
cd blurr
pip install -e ".[dev]"
Please check the documentation for more thorough examples of how to use this package.
The following two packages need to be installed for blurr to work:
import torch
from transformers import *
from fastai.text.all import *
from blurr.text.data.all import *
from blurr.text.modeling.all import *
path = untar_data(URLs.IMDB_SAMPLE)
model_path = Path("models")
imdb_df = pd.read_csv(path / "texts.csv")
n_labels = len(imdb_df["label"].unique())
model_cls = AutoModelForSequenceClassification
pretrained_model_name = "bert-base-uncased"
config = AutoConfig.from_pretrained(pretrained_model_name)
config.num_labels = n_labels
hf_arch, hf_config, hf_tokenizer, hf_model = NLP.get_hf_objects(pretrained_model_name, model_cls=model_cls, config=config)
blocks = (TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model), CategoryBlock)
dblock = DataBlock(blocks=blocks, get_x=ColReader("text"), get_y=ColReader("label"), splitter=ColSplitter())
dls = dblock.dataloaders(imdb_df, bs=4)
dls.show_batch(dataloaders=dls, max_n=2)
model = BaseModelWrapper(hf_model)
learn = Learner(
dls,
model,
opt_func=partial(Adam, decouple_wd=True),
loss_func=CrossEntropyLossFlat(),
metrics=[accuracy],
cbs=[BaseModelCallback],
splitter=blurr_splitter,
)
learn.freeze()
learn.fit_one_cycle(3, lr_max=1e-3)
learn.show_results(learner=learn, max_n=2)
Using the high-level API we can reduce DataBlock, DataLoaders, and Learner creation into a single line of code.
Included in the high-level API is a general BLearner
class (pronouned "Blurrner") that you can use with hand crafted DataLoaders, as well as, task specific BLearners like BLearnerForSequenceClassification
that will handle everything given your raw data sourced from a pandas DataFrame, CSV file, or list of dictionaries (for example a huggingface datasets dataset)
learn = BlearnerForSequenceClassification.from_data(imdb_df, pretrained_model_name, dl_kwargs={"bs": 4})
learn.fit_one_cycle(1, lr_max=1e-3)
learn.show_results(learner=learn, max_n=2)
A word of gratitude to the following individuals, repos, and articles upon which much of this work is inspired from: