--- title: text.data.language_modeling keywords: fastai sidebar: home_sidebar summary: "This module contains the bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data for causal and masked language modeling tasks. This includes things like training BERT from scratch or fine-tuning a particular pre-trained LM on your own corpus." description: "This module contains the bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data for causal and masked language modeling tasks. This includes things like training BERT from scratch or fine-tuning a particular pre-trained LM on your own corpus." nb_path: "nbs/12_text-data-language-modeling.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
 
{% endraw %} {% raw %}
{% endraw %} {% raw %}
What we're running with at the time this documentation was generated:
torch: 1.10.1+cu111
fastai: 2.5.6
transformers: 4.16.2
{% endraw %}

Setup

For this example, we'll use the WIKITEXT_TINY dataset available from fastai. In addition to using the Datasets library from Hugging Face, fastai provides a lot of smaller datasets that are really useful when experimenting and/or in the early development of your training/validation/inference coding.

{% raw %}
wiki_path = untar_data(URLs.WIKITEXT_TINY)
wiki_path.ls()
(#2) [Path('/home/wgilliam/.fastai/data/wikitext-2/train.csv'),Path('/home/wgilliam/.fastai/data/wikitext-2/test.csv')]
{% endraw %} {% raw %}
train_df = pd.read_csv(wiki_path / "train.csv", header=None)
valid_df = pd.read_csv(wiki_path / "test.csv", header=None)

print(len(train_df), len(valid_df))
train_df.head()
615 47
0
0 \n = 2013 – 14 York City F.C. season = \n \n The 2013 – 14 season was the <unk> season of competitive association football and 77th season in the Football League played by York City Football Club , a professional football club based in York , North Yorkshire , England . Their 17th @-@ place finish in 2012 – 13 meant it was their second consecutive season in League Two . The season ran from 1 July 2013 to 30 June 2014 . \n Nigel Worthington , starting his first full season as York manager , made eight permanent summer signings . By the turn of the year York were only above the relegation z...
1 \n = Big Boy ( song ) = \n \n " Big Boy " <unk> " I 'm A Big Boy Now " was the first single ever recorded by the Jackson 5 , which was released by Steeltown Records in January 1968 . The group played instruments on many of their Steeltown compositions , including " Big Boy " . The song was neither a critical nor commercial success , but the Jackson family were delighted with the outcome nonetheless . \n The Jackson 5 would release a second single with Steeltown Records before moving to Motown Records . The group 's recordings at Steeltown Records were thought to be lost , but they were re...
2 \n = The Remix ( Lady Gaga album ) = \n \n The Remix is a remix album by American recording artist Lady Gaga . Released in Japan on March 3 , 2010 , it contains remixes of the songs from her first studio album , The Fame ( 2008 ) , and her third extended play , The Fame Monster ( 2009 ) . A revised version of the track list was prepared for release in additional markets , beginning with Mexico on May 3 , 2010 . A number of recording artists have produced the songs , including Pet Shop Boys , Passion Pit and The Sound of Arrows . The remixed versions feature both uptempo and <unk> composit...
3 \n = New Year 's Eve ( Up All Night ) = \n \n " New Year 's Eve " is the twelfth episode of the first season of the American comedy television series Up All Night . The episode originally aired on NBC in the United States on January 12 , 2012 . It was written by Erica <unk> and was directed by Beth McCarthy @-@ Miller . The episode also featured a guest appearance from Jason Lee as Chris and Reagan 's neighbor and Ava 's boyfriend , Kevin . \n During Reagan ( Christina Applegate ) and Chris 's ( Will <unk> ) first New Year 's Eve game night , Reagan 's competitiveness comes out causing Ch...
4 \n = Geopyxis carbonaria = \n \n Geopyxis carbonaria is a species of fungus in the genus Geopyxis , family <unk> . First described to science in 1805 , and given its current name in 1889 , the species is commonly known as the charcoal loving elf @-@ cup , dwarf <unk> cup , <unk> <unk> cup , or pixie cup . The small , <unk> @-@ shaped fruitbodies of the fungus are reddish @-@ brown with a whitish fringe and measure up to 2 cm ( 0 @.@ 8 in ) across . They have a short , tapered stalk . Fruitbodies are commonly found on soil where brush has recently been burned , sometimes in great numbers ....
{% endraw %} {% raw %}
train_df["is_valid"] = False
valid_df["is_valid"] = True

df = pd.concat([train_df, valid_df])
df.head()
0 is_valid
0 \n = 2013 – 14 York City F.C. season = \n \n The 2013 – 14 season was the <unk> season of competitive association football and 77th season in the Football League played by York City Football Club , a professional football club based in York , North Yorkshire , England . Their 17th @-@ place finish in 2012 – 13 meant it was their second consecutive season in League Two . The season ran from 1 July 2013 to 30 June 2014 . \n Nigel Worthington , starting his first full season as York manager , made eight permanent summer signings . By the turn of the year York were only above the relegation z... False
1 \n = Big Boy ( song ) = \n \n " Big Boy " <unk> " I 'm A Big Boy Now " was the first single ever recorded by the Jackson 5 , which was released by Steeltown Records in January 1968 . The group played instruments on many of their Steeltown compositions , including " Big Boy " . The song was neither a critical nor commercial success , but the Jackson family were delighted with the outcome nonetheless . \n The Jackson 5 would release a second single with Steeltown Records before moving to Motown Records . The group 's recordings at Steeltown Records were thought to be lost , but they were re... False
2 \n = The Remix ( Lady Gaga album ) = \n \n The Remix is a remix album by American recording artist Lady Gaga . Released in Japan on March 3 , 2010 , it contains remixes of the songs from her first studio album , The Fame ( 2008 ) , and her third extended play , The Fame Monster ( 2009 ) . A revised version of the track list was prepared for release in additional markets , beginning with Mexico on May 3 , 2010 . A number of recording artists have produced the songs , including Pet Shop Boys , Passion Pit and The Sound of Arrows . The remixed versions feature both uptempo and <unk> composit... False
3 \n = New Year 's Eve ( Up All Night ) = \n \n " New Year 's Eve " is the twelfth episode of the first season of the American comedy television series Up All Night . The episode originally aired on NBC in the United States on January 12 , 2012 . It was written by Erica <unk> and was directed by Beth McCarthy @-@ Miller . The episode also featured a guest appearance from Jason Lee as Chris and Reagan 's neighbor and Ava 's boyfriend , Kevin . \n During Reagan ( Christina Applegate ) and Chris 's ( Will <unk> ) first New Year 's Eve game night , Reagan 's competitiveness comes out causing Ch... False
4 \n = Geopyxis carbonaria = \n \n Geopyxis carbonaria is a species of fungus in the genus Geopyxis , family <unk> . First described to science in 1805 , and given its current name in 1889 , the species is commonly known as the charcoal loving elf @-@ cup , dwarf <unk> cup , <unk> <unk> cup , or pixie cup . The small , <unk> @-@ shaped fruitbodies of the fungus are reddish @-@ brown with a whitish fringe and measure up to 2 cm ( 0 @.@ 8 in ) across . They have a short , tapered stalk . Fruitbodies are commonly found on soil where brush has recently been burned , sometimes in great numbers .... False
{% endraw %} {% raw %}
model_cls = AutoModelForCausalLM

pretrained_model_name = "gpt2"
hf_arch, hf_config, hf_tokenizer, hf_model = NLP.get_hf_objects(pretrained_model_name, model_cls=model_cls)

# some tokenizers like gpt and gpt2 do not have a pad token, so we add it here mainly for the purpose
# of setting the "labels" key appropriately (see below)
if hf_tokenizer.pad_token is None:
    hf_tokenizer.pad_token = "[PAD]"

hf_tokenizer.pad_token, hf_tokenizer.pad_token_id
Using pad_token, but it is not set yet.
('[PAD]', 50256)
{% endraw %} {% raw %}
# num_added_toks = hf_tokenizer.add_special_tokens(special_tokens_dict)
# hf_model.resize_token_embeddings(len(hf_tokenizer))
{% endraw %}

Preprocessing

Starting with version 2.0, BLURR provides a language preprocessing class that can be used to preprocess DataFrames or Hugging Face Datasets for both causal and masked language modeling tasks.

{% raw %}

class LMPreprocessor[source]

LMPreprocessor(hf_tokenizer:PreTrainedTokenizerBase, batch_size:int=1000, chunk_size:Optional[int]=None, sep_token:Optional[str]=None, text_attr:str='text', is_valid_attr:Optional[str]='is_valid', tok_kwargs:dict={}) :: Preprocessor

Type Default Details
hf_tokenizer PreTrainedTokenizerBase A Hugging Face tokenizer
batch_size int 1000 The number of examples to process at a time
chunk_size typing.Optional[int] None How big each chunk of text should be (default: hf_tokenizer.model_max_length)
sep_token typing.Optional[str] None How to indicate the beginning on a new text example (default is hf_tokenizer.eos_token sep_token
text_attr str text The attribute holding the text
is_valid_attr typing.Optional[str] is_valid The attribute that should be created if your are processing individual training and validation
datasets into a single dataset, and will indicate to which each example is associated
tok_kwargs dict None Tokenization kwargs that will be applied with calling the tokenizer
{% endraw %} {% raw %}
{% endraw %}

Using a DataFrame

{% raw %}
preprocessor = LMPreprocessor(hf_tokenizer, chunk_size=128, text_attr=0)
proc_df = preprocessor.process_df(train_df, valid_df)

print(len(proc_df))
proc_df.head(2)
21330
proc_0 is_valid
0 \n = 2013 – 14 York City F.C. season = \n \n The 2013 – 14 season was the <unk> season of competitive association football and 77th season in the Football League played by York City Football Club , a professional football club based in York , North Yorkshire , England . Their 17th @-@ place finish in 2012 – 13 meant it was their second consecutive season in League Two . The season ran from 1 July 2013 to 30 June 2014 . \n Nigel Worthington , starting his first full season as York manager , made eight permanent summer signings . By the turn of the year York were only False
1 above the relegation zone on goal difference , before a 17 @-@ match unbeaten run saw the team finish in seventh @-@ place in the 24 @-@ team 2013 – 14 Football League Two . This meant York qualified for the play @-@ offs , and they were eliminated in the semi @-@ final by Fleetwood Town . York were knocked out of the 2013 – 14 FA Cup , Football League Cup and Football League Trophy in their opening round matches . \n 35 players made at least one appearance in nationally organised first @-@ team competition , and there were 12 different <unk> . Defender Ben Davies missed False
{% endraw %}

Using a Hugging Face Dataset

{% raw %}
 
{% endraw %}

LM Strategies

{% raw %}

LMType[source]

Enum = [CAUSAL, MASKED]

An enumeration.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class BaseLMStrategy[source]

BaseLMStrategy(hf_tokenizer, ignore_token_id=-100) :: ABC

ABC for various language modeling strategies (e.g., causal, BertMLM, WholeWordMLM, etc...)

{% endraw %} {% raw %}
{% endraw %}

Here we include a BaseLMStrategy abstract class and several different strategies for building your inputs and targets for causal and masked language modeling tasks. With CLMs, the objective is to simply predict the next token, but with MLMs, a variety of masking strategies may be used (e.g., mask random tokens, mask random words, mask spans, etc...). A BertMLMStrategy is introduced below that follows the "mask random tokens" strategy used in the BERT paper, but users can create their own BaseLMStrategy subclass to support any masking strategy they desire.

{% raw %}

class CausalLMStrategy[source]

CausalLMStrategy(hf_tokenizer, ignore_token_id=-100) :: BaseLMStrategy

For next token prediction language modeling tasks, we want to use the CausalLMStrategy which makes the necessary changes in your inputs/targets for causal LMs

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class BertMLMStrategy[source]

BertMLMStrategy(hf_tokenizer, ignore_token_id=-100) :: BaseLMStrategy

A masked language modeling strategy using the default BERT masking definition.

{% endraw %} {% raw %}
{% endraw %}

Follows the masking strategy used in the BERT paper for random token masking

Mid-level API

{% raw %}

class CausalLMTextInput[source]

CausalLMTextInput(x, **kwargs) :: TextInput

The base represenation of your inputs; used by the various fastai show methods

{% endraw %} {% raw %}

class MLMTextInput[source]

MLMTextInput(x, **kwargs) :: TextInput

The base represenation of your inputs; used by the various fastai show methods

{% endraw %} {% raw %}
{% endraw %}

Again, we define a custom classes for the @typedispatched methods to use so that we can override how both causal and masked language modeling inputs/targets are assembled, as well as, how the data is shown via methods like show_batch and show_results.

{% raw %}

class LMBatchTokenizeTransform[source]

LMBatchTokenizeTransform(hf_arch:str, hf_config:PretrainedConfig, hf_tokenizer:PreTrainedTokenizerBase, hf_model:PreTrainedModel, include_labels:bool=True, ignore_token_id:int=-100, lm_strategy_cls:BaseLMStrategy=CausalLMStrategy, max_length:int=None, padding:Union[bool, str]=True, truncation:Union[bool, str]=True, is_split_into_words:bool=False, tok_kwargs={}, text_gen_kwargs={}, **kwargs) :: BatchTokenizeTransform

Handles everything you need to assemble a mini-batch of inputs and targets, as well as decode the dictionary produced as a byproduct of the tokenization process in the encodes method.

Type Default Details
hf_arch str The abbreviation/name of your Hugging Face transformer architecture (e.b., bert, bart, etc..)
hf_config PretrainedConfig A specific configuration instance you want to use
hf_tokenizer PreTrainedTokenizerBase A Hugging Face tokenizer
hf_model PreTrainedModel A Hugging Face model
include_labels bool True To control whether the "labels" are included in your inputs. If they are, the loss will be calculated in
the model's forward function and you can simply use PreCalculatedLoss as your Learner's loss function to use it
ignore_token_id int -100 The token ID that should be ignored when calculating the loss
lm_strategy_cls BaseLMStrategy CausalLMStrategy The language modeling strategy (or objective)
max_length int None To control the length of the padding/truncation. It can be an integer or None,
in which case it will default to the maximum length the model can accept. If the model has no
specific maximum input length, truncation/padding to max_length is deactivated.
See Everything you always wanted to know about padding and truncation
padding typing.Union[bool, str] True To control the padding applied to your hf_tokenizer during tokenization. If None, will default to
False or `'do_not_pad'.
See Everything you always wanted to know about padding and truncation
truncation typing.Union[bool, str] True To control truncation applied to your hf_tokenizer during tokenization. If None, will default to
False or do_not_truncate.
See Everything you always wanted to know about padding and truncation
is_split_into_words bool False The is_split_into_words argument applied to your hf_tokenizer during tokenization. Set this to True
if your inputs are pre-tokenized (not numericalized)
tok_kwargs dict None Any other keyword arguments you want included when using your hf_tokenizer to tokenize your inputs
text_gen_kwargs dict None Any keyword arguments you want included when generated text
See How to generate text
kwargs No Content
{% endraw %} {% raw %}
{% endraw %}

Our LMBatchTokenizeTransform allows us to update the input's labels and our targets appropriately given any language modeling task.

The labels argument allows you to forgo calculating the loss yourself by letting Hugging Face return it for you should you choose to do that. Padding tokens are set to -100 by default (e.g., CrossEntropyLossFlat().ignore_index) and prevent cross entropy loss from considering token prediction for tokens it should ... i.e., the padding tokens. For more information on the meaning of this argument, see the Hugging Face glossary entry for "Labels"

Examples

Using the mid-level API

Causal LM

Step 1: Get your Hugging Face objects.
{% raw %}
model_cls = AutoModelForCausalLM

pretrained_model_name = "gpt2"
hf_arch, hf_config, hf_tokenizer, hf_model = NLP.get_hf_objects(pretrained_model_name, model_cls=model_cls)

# some tokenizers like gpt and gpt2 do not have a pad token, so we add it here mainly for the purpose
# of setting the "labels" key appropriately (see below)
if hf_tokenizer.pad_token is None:
    hf_tokenizer.pad_token = "[PAD]"
Using pad_token, but it is not set yet.
{% endraw %}
Step 2: Preprocess data
{% raw %}
preprocessor = LMPreprocessor(hf_tokenizer, chunk_size=128, text_attr=0)
proc_df = preprocessor.process_df(train_df, valid_df)

print(len(proc_df))
proc_df.head(2)
21330
proc_0 is_valid
0 \n = 2013 – 14 York City F.C. season = \n \n The 2013 – 14 season was the <unk> season of competitive association football and 77th season in the Football League played by York City Football Club , a professional football club based in York , North Yorkshire , England . Their 17th @-@ place finish in 2012 – 13 meant it was their second consecutive season in League Two . The season ran from 1 July 2013 to 30 June 2014 . \n Nigel Worthington , starting his first full season as York manager , made eight permanent summer signings . By the turn of the year York were only False
1 above the relegation zone on goal difference , before a 17 @-@ match unbeaten run saw the team finish in seventh @-@ place in the 24 @-@ team 2013 – 14 Football League Two . This meant York qualified for the play @-@ offs , and they were eliminated in the semi @-@ final by Fleetwood Town . York were knocked out of the 2013 – 14 FA Cup , Football League Cup and Football League Trophy in their opening round matches . \n 35 players made at least one appearance in nationally organised first @-@ team competition , and there were 12 different <unk> . Defender Ben Davies missed False
{% endraw %}
Step 3: Create your DataBlock
{% raw %}
batch_tok_tfm = LMBatchTokenizeTransform(hf_arch, hf_config, hf_tokenizer, hf_model, lm_strategy_cls=CausalLMStrategy)

blocks = (TextBlock(batch_tokenize_tfm=batch_tok_tfm, input_return_type=CausalLMTextInput), noop)

dblock = DataBlock(blocks=blocks, get_x=ColReader("proc_0"), splitter=ColSplitter(col="is_valid"))
{% endraw %}
Step 4: Build your DataLoaders
{% raw %}
dls = dblock.dataloaders(proc_df, bs=4)
{% endraw %} {% raw %}
b = dls.one_batch()
{% endraw %} {% raw %}
b[0]["input_ids"].shape, b[0]["labels"].shape, b[1].shape
(torch.Size([4, 129]), torch.Size([4, 129]), torch.Size([4, 129]))
{% endraw %} {% raw %}
explode_types(b)
{tuple: [dict, torch.Tensor]}
{% endraw %} {% raw %}
{% endraw %} {% raw %}
dls.show_batch(dataloaders=dls, max_n=2, trunc_at=500)
text target
0 ₹ 40 million ( US $ 590 @,@ 000 ) was spent solely on VFX for Magadheera. \n \n = = = <unk> = = = \n \n During the film's shoot at Ramoji Film City in late November 2008, a 500 square feet ( 46 m2 ) film can, containing two or three scenes, was discovered missing from Rainbow lab. The filmmakers filed a case at <unk> police station. Security personnel and film unit members searched, but failed to recover the reels. Rajamouli's unit said it was not important if the scenes from �� 40 million ( US $ 590 @,@ 000 ) was spent solely on VFX for Magadheera. \n \n = = = <unk> = = = \n \n During the film's shoot at Ramoji Film City in late November 2008, a 500 square feet ( 46 m2 ) film can, containing two or three scenes, was discovered missing from Rainbow lab. The filmmakers filed a case at <unk> police station. Security personnel and film unit members searched, but failed to recover the reels. Rajamouli's unit said it was not important if the scenes from
1 berg got around the outside of Hamilton in turn one, while Räikkönen lost positions due to a slow getaway. Sebastian Vettel got past Verstappen, but was immediately <unk> on the approach to turn four. At the front of the race, coming out of turn three Hamilton and Rosberg collided ending the race of both Mercedes drivers. The collision resulted in a safety car period, with the order standing : Ricciardo, Verstappen, Sainz, Vettel and Räikkönen. The safety car came in at the end of lap four. Vet erg got around the outside of Hamilton in turn one, while Räikkönen lost positions due to a slow getaway. Sebastian Vettel got past Verstappen, but was immediately <unk> on the approach to turn four. At the front of the race, coming out of turn three Hamilton and Rosberg collided ending the race of both Mercedes drivers. The collision resulted in a safety car period, with the order standing : Ricciardo, Verstappen, Sainz, Vettel and Räikkönen. The safety car came in at the end of lap four. Vette
{% endraw %}

Masked LM

Step 1: Get your Hugging Face objects.
{% raw %}
model_cls = AutoModelForMaskedLM

pretrained_model_name = "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = NLP.get_hf_objects(pretrained_model_name, model_cls=model_cls)

# some tokenizers like gpt and gpt2 do not have a pad token, so we add it here mainly for the purpose
# of setting the "labels" key appropriately (see below)
if hf_tokenizer.pad_token is None:
    hf_tokenizer.pad_token = "[PAD]"
{% endraw %}
Step 2: Preprocess data
{% raw %}
preprocessor = LMPreprocessor(hf_tokenizer, chunk_size=128, text_attr=0)
proc_df = preprocessor.process_df(train_df, valid_df)

print(len(proc_df))
proc_df.head(2)
Using eos_token, but it is not set yet.
21227
proc_0 is_valid
0 \n = 2013 – 14 York City F.C. season = \n \n The 2013 – 14 season was the <unk> season of competitive association football and 77th season in the Football League played by York City Football Club , a professional football club based in York , North Yorkshire , England . Their 17th @-@ place finish in 2012 – 13 meant it was their second consecutive season in League Two . The season ran from 1 July 2013 to 30 June 2014 . \n Nigel Worthington , starting his first full season as York manager , made eight permanent summer signings . By the turn of the year York were only above the relegation z... False
1 goal difference , before a 17 @-@ match unbeaten run saw the team finish in seventh @-@ place in the 24 @-@ team 2013 – 14 Football League Two . This meant York qualified for the play @-@ offs , and they were eliminated in the semi @-@ final by Fleetwood Town . York were knocked out of the 2013 – 14 FA Cup , Football League Cup and Football League Trophy in their opening round matches . \n 35 players made at least one appearance in nationally organised first @-@ team competition , and there were 12 different <unk> . Defender Ben Davies missed only five of the fifty @ False
{% endraw %}
Step 3: Create your DataBlock
{% raw %}
batch_tok_tfm = LMBatchTokenizeTransform(hf_arch, hf_config, hf_tokenizer, hf_model, lm_strategy_cls=BertMLMStrategy)

blocks = (TextBlock(batch_tokenize_tfm=batch_tok_tfm, input_return_type=MLMTextInput), noop)

dblock = DataBlock(blocks=blocks, get_x=ColReader("proc_0"), splitter=ColSplitter(col="is_valid"))
{% endraw %}
Step 4: Build your DataLoaders
{% raw %}
dls = dblock.dataloaders(proc_df, bs=4)
{% endraw %} {% raw %}
b = dls.one_batch()
b[0]["input_ids"].shape, b[0]["labels"].shape, b[1].shape
(torch.Size([4, 128]), torch.Size([4, 128]), torch.Size([4, 128]))
{% endraw %} {% raw %}
b[0]["input_ids"][0][:20], b[0]["labels"][0][:20], b[1][0][:20]
(tensor([ 101, 2003, 2098,  103, 2781, 2101, 2083, 1026, 4895, 2243, 1028, 1026,
         4895,  103, 1028, 1998, 1996, 2674, 2736,  103], device='cuda:1'),
 tensor([-100, -100, -100, 2340, -100, -100, -100, -100, -100, -100, -100, -100,
         -100, 2243, -100, -100, -100, -100, -100, 1037], device='cuda:1'),
 tensor([-100, -100, -100, 2340, -100, -100, -100, -100, -100, -100, -100, -100,
         -100, 2243, -100, -100, -100, -100, -100, 1037], device='cuda:1'))
{% endraw %} {% raw %}
explode_types(b)
{tuple: [dict, torch.Tensor]}
{% endraw %} {% raw %}
{% endraw %} {% raw %}
dls.show_batch(dataloaders=dls, max_n=2, trunc_at=250)
text target
0 is ##ed 11 minutes later through < un ##k > < un ##k > and [MASK] match finished a 1 [–] 1 draw . york were knocked [out] of the fa cup after losing 3 [MASK] 2 [MASK] home to bristol rovers [MASK] a first [MASK] replay ; the [MASK] were 3 – 0 up by 50 @ - @ minutes [MASK] fletcher pulled two back [MASK] york with a penalty and [MASK] long @ - [@] range strike . [miniature] keith lowe , of cheltenham [MASK] and goalkeeper nick [MASK] [MASK] of charlton athletic [MASK] were signed on loan until january 2014 . they both played in york ' s first league defeat in four weeks [MASK] 2 – 1 away , [MASK] southend united is ##ed 11 minutes later through < un ##k > < un ##k > and [the] match finished a 1 [–] 1 draw . york were knocked [out] of the fa cup after losing 3 [–] 2 [at] home to bristol rovers [in] a first [round] replay ; the [visitors] were 3 – 0 up by 50 @ - @ minutes [before] fletcher pulled two back [for] york with a penalty and [a] long @ - [@] range strike . [defender] keith lowe , of cheltenham [,] and goalkeeper nick [pope] [,] of charlton athletic [,] were signed on loan until january 2014 . they both played in york ' s first league defeat in four weeks [,] 2 – 1 away , [to] southend united
1 ll [##us] , g ##yr ##op [MASK] , and ph ##le ##bo [MASK] ) , [MASK] six are [MASK] ##ter [MASK] ( as ##tra ##eus , cal ##ost ##oma , dip ##lo ##cy ##stis , pi ##sol [MASK] ##us , [MASK] sc [MASK] ##oder ##ma ) . since the sub ##ord ##er ' s original [MASK] , there [##ม] been several phylogenetic studies investigating the sc ##ler ##oder ##mat ##ine [MASK] . some studies have revealed [MASK] existence of numerous cryptic species and have contributed to taxonomic [MASK] [MASK] the group . [MASK] " core " [sc] ##ler ##oder ##mat ##ine ##ae include the genera as ##tra ##eus , cal ##ost ##oma , sc ##ler ##oder [##ma] , pi ##sol ##ith [MASK] , dip ##lo ##cy ll [##us] , g ##yr ##op [##orus] , and ph ##le ##bo [##pus] ) , [and] six are [gas] ##ter [##oid] ( as ##tra ##eus , cal ##ost ##oma , dip ##lo ##cy ##stis , pi ##sol [##ith] ##us , [and] sc [##ler] ##oder ##ma ) . since the sub ##ord ##er ' s original [description] , there [have] been several phylogenetic studies investigating the sc ##ler ##oder ##mat ##ine [##ae] . some studies have revealed [the] existence of numerous cryptic species and have contributed to taxonomic [expansion] [of] the group . [the] " core " [sc] ##ler ##oder ##mat ##ine ##ae include the genera as ##tra ##eus , cal ##ost ##oma , sc ##ler ##oder [##ma] , pi ##sol ##ith [##us] , dip ##lo ##cy
{% endraw %}