--- title: data.seq2seq.summarization keywords: fastai sidebar: home_sidebar summary: "This module contains the bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data for summarization tasks using architectures like BART and T5." description: "This module contains the bits required to use the fastai DataBlock API and/or mid-level data processing pipelines to organize your data for summarization tasks using architectures like BART and T5." nb_path: "nbs/01zc_data-seq2seq-summarization.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
 
{% endraw %} {% raw %}
{% endraw %} {% raw %}
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Using GPU #1: GeForce GTX 1080 Ti
{% endraw %}

Summarization tokenization, batch transform, and DataBlock methods

Summarization tasks attempt to generate a human-understandable and sensible representation of a larger body of text (e.g., capture the meaning of a larger document in 1-3 sentences).

{% raw %}
path = Path('./')
cnndm_df = pd.read_csv(path/'cnndm_sample.csv'); len(cnndm_df)
1000
{% endraw %} {% raw %}
cnndm_df.head(2)
article highlights ds_type
0 (CNN) -- Globalization washes like a flood over the world's cultures and economies. Floods can be destructive; however, they can also bring blessings, as the annual floods of the Nile did for ancient Egypt. The world's great universities can be crucial instruments in shaping, in a positive way, humankind's reaction to globalization and the development of humankind itself. Traditionally, universities have been defined and limited by location, creating an academic community and drawing students and scholars to that place. Eventually, some universities began to encourage students to study el... John Sexton: Traditionally, universities have been defined and limited by location .\nGlobal campuses form a network of thought, innovation, he writes .\nFaculty can teach, Sexton says, students can team up in many cities at once .\nSexton: Research, scholarship can be shared and cultural ties made in "century of knowledge" train
1 (CNN) -- Armenian President Robert Kocharian declared a state of emergency Saturday night after a day of clashes between police and protesters, a spokeswoman for the Armenian Foreign Ministry said. Opposition supporters wave an Armenian flag during a protest rally in Yerevan, Armenia, on Saturday. The protesters claim last month's presidential election was rigged. The state of emergency will "hopefully bring some order" to the capital, Yerevan, said Salpi Ghazarian, assistant to the Armenian foreign minister, who spoke to CNN early Sunday. The state of emergency could last until March 20, ... NEW: Protest moves after crackdown at Freedom Square .\nOrder sought after protests over last month's election turn violent .\nDemonstrators say the election was fraudulent .\nState of emergency could last until March 20, official says . train
{% endraw %} {% raw %}
pretrained_model_name = "facebook/bart-large-cnn"
model_cls = AutoModelForSeq2SeqLM

hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, model_cls=model_cls)
hf_arch, type(hf_tokenizer), type(hf_config), type(hf_model)
('bart',
 transformers.models.bart.tokenization_bart_fast.BartTokenizerFast,
 transformers.models.bart.configuration_bart.BartConfig,
 transformers.models.bart.modeling_bart.BartForConditionalGeneration)
{% endraw %} {% raw %}
blocks = (HF_Seq2SeqBlock(hf_arch, hf_config, hf_tokenizer, hf_model), noop)

dblock = DataBlock(blocks=blocks, 
                   get_x=ColReader('article'), 
                   get_y=ColReader('highlights'), 
                   splitter=RandomSplitter())
{% endraw %}

Two lines! Notice we pass in noop for our targets (e.g. our summaries) because the batch transform will take care of both out inputs and targets.

{% raw %}
dblock.summary(cnndm_df)
Setting-up type transforms pipelines
Collecting items from                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      article  \
0    (CNN)  -- Globalization washes like a flood over the world's cultures and economies. Floods can be destructive; however, they can also bring blessings, as the annual floods of the Nile did for ancient Egypt. The world's great universities can be crucial instruments in shaping, in a positive way, humankind's reaction to globalization and the development of humankind itself. Traditionally, universities have been defined and limited by location, creating an academic community and drawing students and scholars to that place. Eventually, some universities began to encourage students to study el...   
1    (CNN) -- Armenian President Robert Kocharian declared a state of emergency Saturday night after a day of clashes between police and protesters, a spokeswoman for the Armenian Foreign Ministry said. Opposition supporters wave an Armenian flag during a protest rally in Yerevan, Armenia, on Saturday. The protesters claim last month's presidential election was rigged. The state of emergency will "hopefully bring some order" to the capital, Yerevan, said Salpi Ghazarian, assistant to the Armenian foreign minister, who spoke to CNN early Sunday. The state of emergency could last until March 20, ...   
2    (Mental Floss) -- President Barack Obama turns 48 on Tuesday. While the first family encourages you to send contributions to your favorite charity in lieu of the White House, if you insist on doing some last-minute birthday shopping for 44, you might consider a pair of jeans or a case of Bud Light. For some historical precedent, here's a look back at some of the more interesting presidential gifts. Future president Barack Obama and his family blow out the candles on his birthday cake in 2004. George W. Bush: Raw lamb . President Bush and his family received about 1,000 gifts per month duri...   
3    (CNN) -- Renee Pernice, a 35-year-old mother of two young children, vanished from her home in Kansas City, Missouri, shortly after New Year's this year. She hasn't been heard from since. Renee Pernice is pictured here with her two sons and husband, Shon. Police believe foul play is involved, yet they have not found her body. No one has been arrested in the case. Police have not named her husband, Shon Pernice, as a person of interest or a suspect in the case. However, "he's the last known person to see her alive," said Doug Niemeier, a sergeant with the Kansas City Police Department. Six m...   
4    Michelle Crumrine was out of town when a tornado tore through her neighborhood. She returned to Washington, Illinois, to find pieces of her life strewn about where her house once stood. "A lot of people have a pile of rubble still," she said, "and I don't have anything. ... It's gone. I don't know where it went." Nearby, rescuers with flashlights trudged through the neighborhood in the dark of night, searching for signs of life in the wreckage. As a severe weather system slammed the Midwest on Sunday, spawning dozens of tornadoes, flash floods and hail, this town of 10,000 people was among...   
..                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ...   
995  New Delhi (CNN) -- In New Delhi's upscale diplomatic district, Ram Dhan lives in a parallel world. For years, his home has been a rickety shanty that he shares with his ailing wife, a young son, a daughter-in-law and two grandchildren. Now 62, Dhan has lived through India's journey as an independent nation. He finds little reason to rejoice as the country celebrates Monday, the 64th anniversary of freedom from British rule. "The poor have hardly benefited," he says. Sitting on a cot in his shack huddled in a squalid slum in one of the richest neighborhoods of the Indian capital, he bitterl...   
996  (CNN) -- We've seen deaths, weddings, dramatic costume changes, surprise hookups and more deaths. And that's just in the past five years or so. The world of superhero comics has seen a lot of changes recently, with the demise -- and in some cases, resurrection - of Robin, Captain America, Peter Parker, Professor Charles Xavier and the Human Torch. Clark Kent walked out on his job and dated Wonder Woman. There have been revelations that multiple characters were gay, along with a same-sex wedding or two. (There also was a complete reboot in 2011 for DC Comics, which like CNN is owned by Time...   
997  (CNN) -- Sarin gas has been used several times in the Syrian civil war, including at least once by the Assad regime, France's foreign minister said Tuesday, citing results from test samples in France's possession. Laurent Fabius announced that conclusion after meeting with the head of a United Nations mission set up to establish the facts about the alleged use of chemical weapons in Syria. "I gave him the results of tests carried out by our lab appointed by the Organization for the Prohibition of Chemical Weapons to identify chemical warfare," Fabius said in a statement, referring to the S...   
998  (CNN) -- When the 2,455th star on the Hollywood Walk of Fame is laid on Monday for actor Steve Guttenberg, Ana Martinez will once again be working behind the scenes as curator-in-chief to the iconic attraction. The Walk of Fame is one of showbiz's most visible landmarks,and Martinez has been its inconspicuous producer for almost half the attraction's 51 years, making sure the constellation of stars is perfectly aligned on the Hollywood sidewalks. For 24 years, she has been the person deciding where celebrities will receive their coveted symbol of fame in the heart of Tinseltown. When a sta...   
999  President Barack Obama's announcement that he now supports same-sex marriage reflects a dramatic shift taking place across the country. Last year, for the first time, polls found a majority of Americans share that stance. Surveys show the country's position has undergone a rapid change over the past 15 years -- one not seen on other issues. On climate change, abortion and the death penalty, "we're not seeing Americans necessarily becoming more liberal," said Frank Newport, editor in chief of Gallup. "This one stands a little alone." Views of same-sex marriage around the world . In 1996, Ga...   

                                                                                                                                                                                                                                                                                                                                highlights  \
0    John Sexton: Traditionally, universities have been defined and limited by location .\nGlobal campuses form a network of thought, innovation, he writes .\nFaculty can teach, Sexton says, students can team up in many cities at once .\nSexton: Research, scholarship can be shared and cultural ties made in "century of knowledge"   
1                                                                                            NEW: Protest moves after crackdown at Freedom Square .\nOrder sought after protests over last month's election turn violent .\nDemonstrators say the election was fraudulent .\nState of emergency could last until March 20, official says .   
2                                                                                                                             The president of Argentina gave George W. Bush 300 pounds of lamb meat .\nJFK received a carved peach pit in his likeness .\nA bowling alley was installed in the White House as a birthday gift to Truman .   
3                    Renee Pernice disappeared from her home shortly after New Year's this year .\nPolice believe foul play is involved, although they have not found a body .\nPolice say her husband accessed a HazMat building shortly after she disappeared .\nAttorneys for husband, Shon Pernice, declined comment for this report .   
4                                                                           A third person died in Massac County, Illinois, bringing the state total death toll to 6 .\nHigh winds knock out power to 390,000 in Michigan .\n67 tornadoes were reported across the region .\n"I don't have anything. ... It's gone," a storm victim says .   
..                                                                                                                                                                                                                                                                                                                                     ...   
995                                                   India is marking its 64th anniversary of freedom from British rule .\n"The poor have hardly benefited," says one man .\nPrime Minister Manmohan Singh admits much needs to be done for the "common man"\nPolicymakers have agreed over the years that corruption is a major factor .   
996                                                                                                  From the deaths of Robin and Peter Parker to major reboots, comics see a lot of change .\nComics creators say that the most important element is a well-told story .\nThree comic book experts share their views of this phenomenon .   
997     NEW: Fabius says the Assad regime is culpable in at least one instance .\nFrance is certain sarin gas was used in Syria "several times," Fabius says .\nThe announcement comes after a meeting with the head of a fact-finding mission .\nHuman Rights Council report: "Reasonable grounds" to believe chemical agents were used .   
998                                                                                                                            Ana Martinez is the curator-in-chief for the Hollywood Walk of Fame .\nSo far Martinez has produced 586 star ceremonies .\nOn Monday she will be at the helm for the unveiling of Steve Guttenberg's star .   
999                                                                                                                  Gallup and CNN polls in 2011 found the majority of Americans supported same-sex marriage .\nUsually such changes take place over longer blocks of time, a pollster says .\nYoung adults are helping drive the shift .   

    ds_type  
0     train  
1     train  
2     train  
3     train  
4     train  
..      ...  
995   train  
996   train  
997   train  
998   train  
999   train  

[1000 rows x 3 columns]
Found 1000 items
2 datasets of sizes 800,200
Setting up Pipeline: ColReader -- {'cols': 'article', 'pref': '', 'suff': '', 'label_delim': None}
Setting up Pipeline: ColReader -- {'cols': 'highlights', 'pref': '', 'suff': '', 'label_delim': None}

Building one sample
  Pipeline: ColReader -- {'cols': 'article', 'pref': '', 'suff': '', 'label_delim': None}
    starting from
      article       (CNN Student News) -- April 5, 2013 . Media Literacy Question of the Day . Before you were to travel to another country, where might you go to get credible information on what you could and could not do there? * . * . Know Your News -- The following questions relate to events that were covered this week on CNN Student News. Write your answers in the space provided. Click here for a PDF version of this Newsquiz. 1. What African island nation is being overrun by swarms of locusts? * . * . 2. What nation is led by Kim Jong Un? * . * . 3. What holy day, which took place on Sunday, is one of th...
highlights                                                                                                                                                                                                                                                                                                                                                                                                                    Use the weekly Newsquiz to test your knowledge of stories you saw on CNN Student News .\nWrite your answers in the space provided .\nToday's Newsquiz includes the Media Literacy Question of the Day .
ds_type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         train
Name: 118, dtype: object
    applying ColReader -- {'cols': 'article', 'pref': '', 'suff': '', 'label_delim': None} gives
      (CNN Student News) -- April 5, 2013 . Media Literacy Question of the Day . Before you were to travel to another country, where might you go to get credible information on what you could and could not do there? * . * . Know Your News -- The following questions relate to events that were covered this week on CNN Student News. Write your answers in the space provided. Click here for a PDF version of this Newsquiz. 1. What African island nation is being overrun by swarms of locusts? * . * . 2. What nation is led by Kim Jong Un? * . * . 3. What holy day, which took place on Sunday, is one of the most important days to Christians? * . * . 4. What is the general term for a group of developmental disorders that affect social interaction, behavior and language? * . * . 5. Which professional sport launched its season this week with players wearing patches in honor of school shooting victims in Newtown, Connecticut? * . * . 6. What gun rights organization supports a school safety plan that includes arming adults inside schools? * . * . 7. What U.S. state passed a strict law that bans more than 100 additional kinds of guns? * . * . 8. In what U.S. state did an oil pipeline break, forcing some people out of their homes? * . * . 9. President Obama announced a $100 million proposal to research what human organ? * . * . 10. To what U.S. island territory in the Pacific Ocean has America sent defense missiles to guard against a possible North Korean attack? * . * .
  Pipeline: ColReader -- {'cols': 'highlights', 'pref': '', 'suff': '', 'label_delim': None}
    starting from
      article       (CNN Student News) -- April 5, 2013 . Media Literacy Question of the Day . Before you were to travel to another country, where might you go to get credible information on what you could and could not do there? * . * . Know Your News -- The following questions relate to events that were covered this week on CNN Student News. Write your answers in the space provided. Click here for a PDF version of this Newsquiz. 1. What African island nation is being overrun by swarms of locusts? * . * . 2. What nation is led by Kim Jong Un? * . * . 3. What holy day, which took place on Sunday, is one of th...
highlights                                                                                                                                                                                                                                                                                                                                                                                                                    Use the weekly Newsquiz to test your knowledge of stories you saw on CNN Student News .\nWrite your answers in the space provided .\nToday's Newsquiz includes the Media Literacy Question of the Day .
ds_type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         train
Name: 118, dtype: object
    applying ColReader -- {'cols': 'highlights', 'pref': '', 'suff': '', 'label_delim': None} gives
      Use the weekly Newsquiz to test your knowledge of stories you saw on CNN Student News .
Write your answers in the space provided .
Today's Newsquiz includes the Media Literacy Question of the Day .

Final sample: ('(CNN Student News) -- April 5, 2013 . Media Literacy Question of the Day . Before you were to travel to another country, where might you go to get credible information on what you could and could not do there? * . * . Know Your News -- The following questions relate to events that were covered this week on CNN Student News. Write your answers in the space provided. Click here for a PDF version of this Newsquiz. 1. What African island nation is being overrun by swarms of locusts? * . * . 2. What nation is led by Kim Jong Un? * . * . 3. What holy day, which took place on Sunday, is one of the most important days to Christians? * . * . 4. What is the general term for a group of developmental disorders that affect social interaction, behavior and language? * . * . 5. Which professional sport launched its season this week with players wearing patches in honor of school shooting victims in Newtown, Connecticut? * . * . 6. What gun rights organization supports a school safety plan that includes arming adults inside schools? * . * . 7. What U.S. state passed a strict law that bans more than 100 additional kinds of guns? * . * . 8. In what U.S. state did an oil pipeline break, forcing some people out of their homes? * . * . 9. President Obama announced a $100 million proposal to research what human organ? * . * . 10. To what U.S. island territory in the Pacific Ocean has America sent defense missiles to guard against a possible North Korean attack? * . * .', "Use the weekly Newsquiz to test your knowledge of stories you saw on CNN Student News .\nWrite your answers in the space provided .\nToday's Newsquiz includes the Media Literacy Question of the Day .")


Collecting items from                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      article  \
0    (CNN)  -- Globalization washes like a flood over the world's cultures and economies. Floods can be destructive; however, they can also bring blessings, as the annual floods of the Nile did for ancient Egypt. The world's great universities can be crucial instruments in shaping, in a positive way, humankind's reaction to globalization and the development of humankind itself. Traditionally, universities have been defined and limited by location, creating an academic community and drawing students and scholars to that place. Eventually, some universities began to encourage students to study el...   
1    (CNN) -- Armenian President Robert Kocharian declared a state of emergency Saturday night after a day of clashes between police and protesters, a spokeswoman for the Armenian Foreign Ministry said. Opposition supporters wave an Armenian flag during a protest rally in Yerevan, Armenia, on Saturday. The protesters claim last month's presidential election was rigged. The state of emergency will "hopefully bring some order" to the capital, Yerevan, said Salpi Ghazarian, assistant to the Armenian foreign minister, who spoke to CNN early Sunday. The state of emergency could last until March 20, ...   
2    (Mental Floss) -- President Barack Obama turns 48 on Tuesday. While the first family encourages you to send contributions to your favorite charity in lieu of the White House, if you insist on doing some last-minute birthday shopping for 44, you might consider a pair of jeans or a case of Bud Light. For some historical precedent, here's a look back at some of the more interesting presidential gifts. Future president Barack Obama and his family blow out the candles on his birthday cake in 2004. George W. Bush: Raw lamb . President Bush and his family received about 1,000 gifts per month duri...   
3    (CNN) -- Renee Pernice, a 35-year-old mother of two young children, vanished from her home in Kansas City, Missouri, shortly after New Year's this year. She hasn't been heard from since. Renee Pernice is pictured here with her two sons and husband, Shon. Police believe foul play is involved, yet they have not found her body. No one has been arrested in the case. Police have not named her husband, Shon Pernice, as a person of interest or a suspect in the case. However, "he's the last known person to see her alive," said Doug Niemeier, a sergeant with the Kansas City Police Department. Six m...   
4    Michelle Crumrine was out of town when a tornado tore through her neighborhood. She returned to Washington, Illinois, to find pieces of her life strewn about where her house once stood. "A lot of people have a pile of rubble still," she said, "and I don't have anything. ... It's gone. I don't know where it went." Nearby, rescuers with flashlights trudged through the neighborhood in the dark of night, searching for signs of life in the wreckage. As a severe weather system slammed the Midwest on Sunday, spawning dozens of tornadoes, flash floods and hail, this town of 10,000 people was among...   
..                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ...   
995  New Delhi (CNN) -- In New Delhi's upscale diplomatic district, Ram Dhan lives in a parallel world. For years, his home has been a rickety shanty that he shares with his ailing wife, a young son, a daughter-in-law and two grandchildren. Now 62, Dhan has lived through India's journey as an independent nation. He finds little reason to rejoice as the country celebrates Monday, the 64th anniversary of freedom from British rule. "The poor have hardly benefited," he says. Sitting on a cot in his shack huddled in a squalid slum in one of the richest neighborhoods of the Indian capital, he bitterl...   
996  (CNN) -- We've seen deaths, weddings, dramatic costume changes, surprise hookups and more deaths. And that's just in the past five years or so. The world of superhero comics has seen a lot of changes recently, with the demise -- and in some cases, resurrection - of Robin, Captain America, Peter Parker, Professor Charles Xavier and the Human Torch. Clark Kent walked out on his job and dated Wonder Woman. There have been revelations that multiple characters were gay, along with a same-sex wedding or two. (There also was a complete reboot in 2011 for DC Comics, which like CNN is owned by Time...   
997  (CNN) -- Sarin gas has been used several times in the Syrian civil war, including at least once by the Assad regime, France's foreign minister said Tuesday, citing results from test samples in France's possession. Laurent Fabius announced that conclusion after meeting with the head of a United Nations mission set up to establish the facts about the alleged use of chemical weapons in Syria. "I gave him the results of tests carried out by our lab appointed by the Organization for the Prohibition of Chemical Weapons to identify chemical warfare," Fabius said in a statement, referring to the S...   
998  (CNN) -- When the 2,455th star on the Hollywood Walk of Fame is laid on Monday for actor Steve Guttenberg, Ana Martinez will once again be working behind the scenes as curator-in-chief to the iconic attraction. The Walk of Fame is one of showbiz's most visible landmarks,and Martinez has been its inconspicuous producer for almost half the attraction's 51 years, making sure the constellation of stars is perfectly aligned on the Hollywood sidewalks. For 24 years, she has been the person deciding where celebrities will receive their coveted symbol of fame in the heart of Tinseltown. When a sta...   
999  President Barack Obama's announcement that he now supports same-sex marriage reflects a dramatic shift taking place across the country. Last year, for the first time, polls found a majority of Americans share that stance. Surveys show the country's position has undergone a rapid change over the past 15 years -- one not seen on other issues. On climate change, abortion and the death penalty, "we're not seeing Americans necessarily becoming more liberal," said Frank Newport, editor in chief of Gallup. "This one stands a little alone." Views of same-sex marriage around the world . In 1996, Ga...   

                                                                                                                                                                                                                                                                                                                                highlights  \
0    John Sexton: Traditionally, universities have been defined and limited by location .\nGlobal campuses form a network of thought, innovation, he writes .\nFaculty can teach, Sexton says, students can team up in many cities at once .\nSexton: Research, scholarship can be shared and cultural ties made in "century of knowledge"   
1                                                                                            NEW: Protest moves after crackdown at Freedom Square .\nOrder sought after protests over last month's election turn violent .\nDemonstrators say the election was fraudulent .\nState of emergency could last until March 20, official says .   
2                                                                                                                             The president of Argentina gave George W. Bush 300 pounds of lamb meat .\nJFK received a carved peach pit in his likeness .\nA bowling alley was installed in the White House as a birthday gift to Truman .   
3                    Renee Pernice disappeared from her home shortly after New Year's this year .\nPolice believe foul play is involved, although they have not found a body .\nPolice say her husband accessed a HazMat building shortly after she disappeared .\nAttorneys for husband, Shon Pernice, declined comment for this report .   
4                                                                           A third person died in Massac County, Illinois, bringing the state total death toll to 6 .\nHigh winds knock out power to 390,000 in Michigan .\n67 tornadoes were reported across the region .\n"I don't have anything. ... It's gone," a storm victim says .   
..                                                                                                                                                                                                                                                                                                                                     ...   
995                                                   India is marking its 64th anniversary of freedom from British rule .\n"The poor have hardly benefited," says one man .\nPrime Minister Manmohan Singh admits much needs to be done for the "common man"\nPolicymakers have agreed over the years that corruption is a major factor .   
996                                                                                                  From the deaths of Robin and Peter Parker to major reboots, comics see a lot of change .\nComics creators say that the most important element is a well-told story .\nThree comic book experts share their views of this phenomenon .   
997     NEW: Fabius says the Assad regime is culpable in at least one instance .\nFrance is certain sarin gas was used in Syria "several times," Fabius says .\nThe announcement comes after a meeting with the head of a fact-finding mission .\nHuman Rights Council report: "Reasonable grounds" to believe chemical agents were used .   
998                                                                                                                            Ana Martinez is the curator-in-chief for the Hollywood Walk of Fame .\nSo far Martinez has produced 586 star ceremonies .\nOn Monday she will be at the helm for the unveiling of Steve Guttenberg's star .   
999                                                                                                                  Gallup and CNN polls in 2011 found the majority of Americans supported same-sex marriage .\nUsually such changes take place over longer blocks of time, a pollster says .\nYoung adults are helping drive the shift .   

    ds_type  
0     train  
1     train  
2     train  
3     train  
4     train  
..      ...  
995   train  
996   train  
997   train  
998   train  
999   train  

[1000 rows x 3 columns]
Found 1000 items
2 datasets of sizes 800,200
Setting up Pipeline: ColReader -- {'cols': 'article', 'pref': '', 'suff': '', 'label_delim': None}
Setting up Pipeline: ColReader -- {'cols': 'highlights', 'pref': '', 'suff': '', 'label_delim': None}
Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline: HF_Seq2SeqBeforeBatchTransform
Setting up after_batch: Pipeline: HF_Seq2SeqAfterBatchTransform

Building one batch
Applying item_tfms to the first sample:
  Pipeline: ToTensor
    starting from
      ((CNN Student News) -- April 5, 2013 . Media Literacy Question of the Day . Before you were to travel to another country, where might you go to get credible information on what you could and could not do there? * . * . Know Your News -- The following questions relate to events that were covered this week on CNN Student News. Write your answers in the space provided. Click here for a PDF version of this Newsquiz. 1. What African island nation is being overrun by swarms of locusts? * . * . 2. What nation is led by Kim Jong Un? * . * . 3. What holy day, which took place on Sunday, is one of the most important days to Christians? * . * . 4. What is the general term for a group of developmental disorders that affect social interaction, behavior and language? * . * . 5. Which professional sport launched its season this week with players wearing patches in honor of school shooting victims in Newtown, Connecticut? * . * . 6. What gun rights organization supports a school safety plan that includes arming adults inside schools? * . * . 7. What U.S. state passed a strict law that bans more than 100 additional kinds of guns? * . * . 8. In what U.S. state did an oil pipeline break, forcing some people out of their homes? * . * . 9. President Obama announced a $100 million proposal to research what human organ? * . * . 10. To what U.S. island territory in the Pacific Ocean has America sent defense missiles to guard against a possible North Korean attack? * . * ., Use the weekly Newsquiz to test your knowledge of stories you saw on CNN Student News .
Write your answers in the space provided .
Today's Newsquiz includes the Media Literacy Question of the Day .)
    applying ToTensor gives
      ((CNN Student News) -- April 5, 2013 . Media Literacy Question of the Day . Before you were to travel to another country, where might you go to get credible information on what you could and could not do there? * . * . Know Your News -- The following questions relate to events that were covered this week on CNN Student News. Write your answers in the space provided. Click here for a PDF version of this Newsquiz. 1. What African island nation is being overrun by swarms of locusts? * . * . 2. What nation is led by Kim Jong Un? * . * . 3. What holy day, which took place on Sunday, is one of the most important days to Christians? * . * . 4. What is the general term for a group of developmental disorders that affect social interaction, behavior and language? * . * . 5. Which professional sport launched its season this week with players wearing patches in honor of school shooting victims in Newtown, Connecticut? * . * . 6. What gun rights organization supports a school safety plan that includes arming adults inside schools? * . * . 7. What U.S. state passed a strict law that bans more than 100 additional kinds of guns? * . * . 8. In what U.S. state did an oil pipeline break, forcing some people out of their homes? * . * . 9. President Obama announced a $100 million proposal to research what human organ? * . * . 10. To what U.S. island territory in the Pacific Ocean has America sent defense missiles to guard against a possible North Korean attack? * . * ., Use the weekly Newsquiz to test your knowledge of stories you saw on CNN Student News .
Write your answers in the space provided .
Today's Newsquiz includes the Media Literacy Question of the Day .)

Adding the next 3 samples

Applying before_batch to the list of samples
  Pipeline: HF_Seq2SeqBeforeBatchTransform
    starting from
      [((CNN Student News) -- April 5, 2013 . Media Literacy Question of the Day . Before you were to travel to another country, where might you go to get credible information on what you could and could not do there? * . * . Know Your News -- The following questions relate to events that were covered this week on CNN Student News. Write your answers in the space provided. Click here for a PDF version of this Newsquiz. 1. What African island nation is being overrun by swarms of locusts? * . * . 2. What nation is led by Kim Jong Un? * . * . 3. What holy day, which took place on Sunday, is one of the most important days to Christians? * . * . 4. What is the general term for a group of developmental disorders that affect social interaction, behavior and language? * . * . 5. Which professional sport launched its season this week with players wearing patches in honor of school shooting victims in Newtown, Connecticut? * . * . 6. What gun rights organization supports a school safety plan that includes arming adults inside schools? * . * . 7. What U.S. state passed a strict law that bans more than 100 additional kinds of guns? * . * . 8. In what U.S. state did an oil pipeline break, forcing some people out of their homes? * . * . 9. President Obama announced a $100 million proposal to research what human organ? * . * . 10. To what U.S. island territory in the Pacific Ocean has America sent defense missiles to guard against a possible North Korean attack? * . * ., Use the weekly Newsquiz to test your knowledge of stories you saw on CNN Student News .
Write your answers in the space provided .
Today's Newsquiz includes the Media Literacy Question of the Day .), (London (CNN) -- A lawyer for numerous alleged victims of phone hacking by journalists has confirmed that he plans to launch legal action in the United States against media mogul Rupert Murdoch's News Corp. Mark Lewis told CNN Friday he expects an initial hearing to be held in New York in about two to three months, as he seeks to begin legal claims in the United States. This summer's scandal around claims of phone hacking by journalists working for the now-defunct News of the World -- run by News International, the British arm of News Corp. -- rocked public confidence in the media, police and politicians. Police in London are now investigating the hacking claims as well as allegations of bribery of police. Senior News Corp. executives have denied that wrongdoing among the company's staff was widespread. Separately, former News of the World editor Andy Coulson has filed a lawsuit against News Corp., his lawyer, Jo Rickards, confirmed to CNN Friday. Rickards did not give elaborate on the subject of the lawsuit. Coulson, who resigned as editor when two News of the World employees were jailed for hacking royal voice mail in 2007, went on to work as Prime Minister David Cameron's spokesman, but resigned when the police launched their new phone-hacking investigation in January this year. He denies knowledge of wrongdoing while he was editor. He is among a dozen people to have been arrested and released on bail by police investigating claims that many celebrities, politicians and victims of crime had their phones hacked. Lewis' clients include the family of Milly Dowler, a missing teenager whose voice mail was allegedly hacked before she was found murdered. Public outrage over the allegations led News International to shut down News of the World in July. "We have been speaking to U.S. lawyers to make applications to U.S. courts in order to assist the investigation of this matter and are looking to pursue legal action on the basis of the Foreign Corrupt Practices Act in the United States, whereby a holding company can be liable for practices outside the jurisdiction where the offence is said to have taken place," Lewis said. "Proceedings will be issued in the U.S. where we will seek information from the company's directors about those issues and about corporate governance." This action was being pursued "because it is in our clients' interests to pursue their cases as thoroughly and properly as possible," Lewis said. He added that "compensation is likely to be higher from U.S. courts than courts in the UK." John Kelly, a lawyer at Schillings law firm, which represents Steve Coogan and other suspected celebrity hacking victims, told CNN its clients have not yet joined the lawsuit against News Corp. in the United States but are interested in the idea. Another lawyer for a number of alleged hacking victims, Mark Thomson, told CNN his clients were pursuing claims in the English civil courts but did not intend to take action in U.S. courts. Earlier this week, a News International statement confirmed it was involved in "advanced negotiations with the Dowler family regarding their compensation settlement." CNN's Jonathan Wald and Laura Smith-Spark contributed to this report., NEW: Andy Coulson is taking legal action against News Corp., his lawyer says .
Lawyer Mark Lewis says an initial hearing may take place in New York this fall .
News Corp is the parent company of the publishers of News of the World .
Outrage over claims of phone hacking by the newspaper's journalists led to its closure .), (Beijing (CNN) -- Chinese communists need to fight corruption if China's ruling party is to survive, China's president said Friday. "The Party is soberly aware of the gravity and danger of corruption," said Hu Jintao, who as president also serves as general secretary of the Communist Party of China. He warned that fighting corruption remains "a major political task the (communist) party must attend to at all times" to ensure survival. Hu spoke in a meeting marking the 90th year anniversary of the CPC. Over 6,000 invited guests, mostly top communist and government leaders and prominent party members from all walks of life, gathered in the cavernous Great Hall of the People for the gala event. For the occasion, the podium was bedecked with a giant CPC emblem, hammer and sickle, and several red flags. Hu delivered the keynote speech that lasted more than one hour. He ticked off the CPC's achievements over the past 90 years. "China has developed rapidly in the past 30 plus years thanks to reform and opening up, and the country must promote its future development by continuing to carry out reform and opening up," he intoned. But he also warned that the party is facing "long-term, complicated and severe tests in governing the country." The major challenges for the party, he pointed out, includes "lacking in drive, incompetence, divorce from people, lacking in initiative, and corruption." To cope with these "growing dangers," Hu urged the party to "police itself and impose strict discipline on its members." He stressed that the party's survival largely depends on "cracking hard on and effectively preventing corruption." If not handled properly, he warned, corruption will "cost the party the trust and support of the people." In 2010, according to Central Commission for Discipline Inspection, the CPC's anti-corruption body, 146,517 party members were punished for corruption. Among them, 5,373 were criminally prosecuted. High-profile corruption cases involving senior officials, experts say, have dented the CPC's reputation and popularity. Still, the CPC remains the biggest ruling political party in the world. It boasts of more than 80 million members, or 16% of China's 1.3 billion population. Among them 24.3% are below 35 years old, and 37.1% have college or higher educations, according to Qinfeng Wang, the vice minister of Organization Department of the Central Committee of the CPC. Hu called on "outstanding individuals in all fields", especially among the youth, to join as party members. One of CPC's challenge, experts say, is how to attract young Chinese, especially those who were born after 1980, many of who perceive the CPC as irrelevant to their daily lives. "The whole party must care about young people, listen to what they have to say," Hu said, because the young people "represent the future and the hope of the party." Hu also acknowledged the zigs and zags the CPC took in its 90-year evolution. "We made mistakes and even suffered severe setbacks in some historical periods," he said. Hu didn't elaborate on the CPC's setbacks. Instead, he said the cause of those mistakes is that "the guiding principles of the party at the time were divorced from the real conditions in China." The CPC's top brass attended the meeting, including Premier Wen Jiabao, Vice President Xi Jinping, who is expected to succeed Hu as communist partry chief next year, and Vice Premier Li Keqiang, who is expected to replace Wen as premier in 2013. CPC was founded in 1921. It took power in China in 1949 after toppling the Kuomintang (Nationalist Party) regime and founding the People's Republic of China. It has since ruled as the single dominant part in the country. Under CPC rule, the Chinese government in recent years has been confronting a slew of intractable problems, including corruption, a growing gap between the rich and the poor and environmental degradation. In recent months, the Communist Party has tightened political control after anonymous calls on cyberspace for a Middle-East inspired "jasmine revolution" triggered jitters in the Chinese leadership. Hu called on the party members to unite and seek social stability. "Without stability, nothing could be done, and even the achievements already made could be lost," he said., China's president says party must fight corruption to survive .
His speech comes on the 90th anniversary of the Communist Part of China .
In 2010, 146,517 party members were punished for corruption .), ((CNN) -- Rita Moreno is one of the rare performers to have won an Oscar, Emmy, Tony and Grammy award. But it took her nearly a lifetime to feel comfortable playing herself. The 81-year-old Puerto Rican legend, who will be awarded a Screen Actors Guild Life Achievement Award in January, credits her ability to adapt quickly as key to her survival in life and the limelight. She was 5 years old when she and her mother braved a perilous ocean voyage from Puerto Rico to New York, before boarding a bus to the Bronx to stay with relatives. "My mami, Rosa Maria Marcano Alverio, was looking for a new start, a new husband," Moreno wrote in her memoir. "She was seeking love and fortune. ..." Her mother's girlfriend suggested putting little Rosita into dance classes, and soon, she was learning Spanish dance from Rita Hayworth's uncle, who was also her dance teacher. Moreno found a home on the stage, and as her talent grew, she was able to contribute to her own, and her mother's dreams. In her memoir, now available in Spanish, she shares her journey. Moreno went from dancing in bars to performing for bar mitzvahs and independent movies. She was invited to a fateful "go-see" with Louis B. Mayer of the Metro-Goldwyn-Mayer film studio. The 16-year-old did her best to dress as her inspiration to impress the studio head. It worked. "'She looks like a Spanish Elizabeth Taylor!'" Moreno recalls Mayer saying at their meeting. "'How does a seven-year contract sound to you, young lady?'" Rosita Dolores Alverio was born in Juncos, Puerto Rico, but Rita Moreno was hatched in Hollywood, California. An MGM studio executive christened her in honor of Hayworth. "Your name has to go," he told her. "Too Italian." Over a decades-long run, she survived a contentious affair with Marlon Brando, a suicide attempt and unpredictable employment to build one of the most storied careers in entertainment. Now, as she prepares to play a grandmother on NBC's "Welcome to the Family," Moreno shared with CNN the struggle to find her own identity, when she stopped trying to be her idols and why she rejected playing George Lopez's mother on his hit TV show. An edited transcript of the conversation is below. CNN: What incredible accomplishments: an Oscar, a Tony, an Emmy, a Grammy, and now the SAG Award. How does all that fit with how you perceive yourself? Moreno: I don't know how to answer that question -- it's all pretty fabulous. You know I'm 81 now. I'm going to be 82 in December and I stand here absolutely astonished at what's happened! Inevitably, when something very prestigious and meaningful like the SAG awards happens, I immediately go back to Puerto Rico and my little hometown and I see this little girl and I go back to being that little girl! And I'm saying, "Is it really possible?" "Is this really happening?" I can't tell you how stunned I am. I'm thrilled to pieces. Really thrilled. CNN: One of the things you describe in the book is playing the role of the "spitfire" and various stereotypical roles of people of color in your desire to work, but you describe your personality as being very different: "quite prudent and conservative" when you first came to Los Angeles. Moreno: You know, I spent a good part of my life looking for an identity that was safe. And, in retrospect, we all know that that is simply not possible, it's not feasible. It doesn't work. I didn't want to be this "Latina girl." I didn't want to be this "sexpot." I had no role models, so I chose one: Elizabeth Taylor. But that doesn't work. And what happens as a result is you live a very muddled life with respect to identity. And when you try to do that, you lose something extremely valuable and important: and that is, self-respect. And the struggle was very painful. CNN: When do you feel like you stopped "trying to be Elizabeth Taylor" and began embracing Rita Moreno? Moreno: Well, you know it really didn't happen until "West Side Story." It took, unhappily, a very long time. That happened with the Oscar, and then of course I also got the Golden Globe (for best supporting actress in "West Side Story"), which was pretty fabulous. I said to myself "I must be worth something. This is pretty terrific." And then I had the heartbreak of my life, practically, because I didn't do a movie again for seven years. I was offered a couple of gang-type movies, and you know I wasn't going to do that anymore. Once I had that little gold man under my arm, I said to myself: "That's it. That day is over." CNN: So "West Side Story" really affirmed your talents and you, personally. For you as an actress, what determines what roles are stereotypical and which ones are true to life? Moreno: Well, gee, that's not a hard question to answer. I mean it's the way it's written. You're either a coffee pourer, or you play someone that's real. (Before the George Lopez) show went on the air, (Lopez) wanted me desperately to play his mother and I just kept saying that is the most disrespectful Hispanic woman I have ever seen and I can't do that. I mean, he really courted me, he sent me bottles of wine and stuff, "I'm here because of you," that kind of thing. And, I know he meant it, he was very sincere and respectful but I just thought that she gave all of us a very bad name. I know she was funny, but I just couldn't see myself doing that part, it really bothered me. And you know, by the way, the lady who played it was wonderful! And what a darling woman. God, what a neat gal she was. But I couldn't do that. I had been there and I had done that, in a manner of speaking, and I just couldn't do that again. CNN: Has there been a difference in how audiences have responded to your memoir in Spanish versus English? Moreno: It's surprising to the (English) speaking people and Hispanic people because both of them have always seen me as a very strong personality. You know the Latinos (have seen me as) el orgullo de Puerto Rico (the pride of Puerto Rico). El orgullo de los Latinos, de la comunidad Latina. And Americans have always seen me that way, too. So (my) struggle is a huge surprise to everybody. They just think you got this way overnight. And I keep telling them, you know what? It took me 81 years to reach this Rita Moreno!, Actress Rita Moreno's new memoir is out in Spanish .
She was born Rosita Dolores Alverio in Puerto Rico, raised in New York .
Moreno: "I spent a good part of my life looking for an identity that was safe")]
    applying HF_Seq2SeqBeforeBatchTransform gives
      [({'input_ids': tensor([    0,    36, 16256,  ...,     1,     1,     1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([    0,  7627,     5,  4114,   491,  2253,  1210,     7,  1296,   110,
         2655,     9,  1652,    47,   794,    15,  3480,  9067,   491,   479,
        50118, 45714,   110,  5274,    11,     5,   980,  1286,   479, 50118,
         5625,    18,   491,  2253,  1210,  1171,     5,  2454, 34130,  5073,
        15680,     9,     5,  1053,   479,     2,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1])}, Tensor of size 71), ({'input_ids': tensor([  0, 928,  36,  ...,   1,   1,   1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([    0,  5178,    35,  5095, 33018,  1478,    16,   602,  1030,   814,
          136,   491,  1913,   482,    39,  2470,   161,   479, 50118, 22532,
         6426,  1190,  3577,   161,    41,  2557,  1576,   189,   185,   317,
           11,   188,   469,    42,  1136,   479, 50118,  5532,  1913,    16,
            5,  4095,   138,     9,     5, 14419,     9,   491,     9,     5,
          623,   479, 50118, 14944, 17952,    81,  1449,     9,  1028, 11597,
           30,     5,  2924,    18,  4225,   669,     7,    63,  6803,   479,
            2])}, Tensor of size 71), ({'input_ids': tensor([   0, 3332,   36,  ...,    1,    1,    1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([    0,   436,    18,   394,   161,   537,   531,  1032,  3198,     7,
         6008,   479, 50118,  9962,  1901,   606,    15,     5,  1814,   212,
         4038,     9,     5, 12416,  4657,     9,   436,   479, 50118,  1121,
         1824,     6, 24543,     6, 39356,   537,   453,    58, 14459,    13,
         3198,   479,     2,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1])}, Tensor of size 71), ({'input_ids': tensor([    0,    36, 16256,  ...,    35,   407,     2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([    0, 13505, 18431, 20591,    18,    92, 15860,    16,    66,    11,
         3453,   479, 50118,  2515,    21,  2421,  4168,  3119, 13520,  4765,
          726,  2802,  1020,    11,  5821,  6511,     6,  1179,    11,   188,
          469,   479, 50118,  9690,  2362,    35,    22,   100,  1240,    10,
          205,   233,     9,   127,   301,   546,    13,    41,  3599,    14,
           21,  1522,   113,     2,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1])}, Tensor of size 71)]

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: HF_Seq2SeqAfterBatchTransform
    starting from
      ({'input_ids': tensor([[    0,    36, 16256,  ...,     1,     1,     1],
        [    0,   928,    36,  ...,     1,     1,     1],
        [    0,  3332,    36,  ...,     1,     1,     1],
        [    0,    36, 16256,  ...,    35,   407,     2]], device='cuda:1'), 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 1, 1, 1]], device='cuda:1'), 'labels': tensor([[    0,  7627,     5,  4114,   491,  2253,  1210,     7,  1296,   110,
          2655,     9,  1652,    47,   794,    15,  3480,  9067,   491,   479,
         50118, 45714,   110,  5274,    11,     5,   980,  1286,   479, 50118,
          5625,    18,   491,  2253,  1210,  1171,     5,  2454, 34130,  5073,
         15680,     9,     5,  1053,   479,     2,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1],
        [    0,  5178,    35,  5095, 33018,  1478,    16,   602,  1030,   814,
           136,   491,  1913,   482,    39,  2470,   161,   479, 50118, 22532,
          6426,  1190,  3577,   161,    41,  2557,  1576,   189,   185,   317,
            11,   188,   469,    42,  1136,   479, 50118,  5532,  1913,    16,
             5,  4095,   138,     9,     5, 14419,     9,   491,     9,     5,
           623,   479, 50118, 14944, 17952,    81,  1449,     9,  1028, 11597,
            30,     5,  2924,    18,  4225,   669,     7,    63,  6803,   479,
             2],
        [    0,   436,    18,   394,   161,   537,   531,  1032,  3198,     7,
          6008,   479, 50118,  9962,  1901,   606,    15,     5,  1814,   212,
          4038,     9,     5, 12416,  4657,     9,   436,   479, 50118,  1121,
          1824,     6, 24543,     6, 39356,   537,   453,    58, 14459,    13,
          3198,   479,     2,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1],
        [    0, 13505, 18431, 20591,    18,    92, 15860,    16,    66,    11,
          3453,   479, 50118,  2515,    21,  2421,  4168,  3119, 13520,  4765,
           726,  2802,  1020,    11,  5821,  6511,     6,  1179,    11,   188,
           469,   479, 50118,  9690,  2362,    35,    22,   100,  1240,    10,
           205,   233,     9,   127,   301,   546,    13,    41,  3599,    14,
            21,  1522,   113,     2,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1]], device='cuda:1')}, Tensor of size 4x71)
    applying HF_Seq2SeqAfterBatchTransform gives
      ({'input_ids': tensor([[    0,    36, 16256,  ...,     1,     1,     1],
        [    0,   928,    36,  ...,     1,     1,     1],
        [    0,  3332,    36,  ...,     1,     1,     1],
        [    0,    36, 16256,  ...,    35,   407,     2]], device='cuda:1'), 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 1, 1, 1]], device='cuda:1'), 'labels': tensor([[    0,  7627,     5,  4114,   491,  2253,  1210,     7,  1296,   110,
          2655,     9,  1652,    47,   794,    15,  3480,  9067,   491,   479,
         50118, 45714,   110,  5274,    11,     5,   980,  1286,   479, 50118,
          5625,    18,   491,  2253,  1210,  1171,     5,  2454, 34130,  5073,
         15680,     9,     5,  1053,   479,     2,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1],
        [    0,  5178,    35,  5095, 33018,  1478,    16,   602,  1030,   814,
           136,   491,  1913,   482,    39,  2470,   161,   479, 50118, 22532,
          6426,  1190,  3577,   161,    41,  2557,  1576,   189,   185,   317,
            11,   188,   469,    42,  1136,   479, 50118,  5532,  1913,    16,
             5,  4095,   138,     9,     5, 14419,     9,   491,     9,     5,
           623,   479, 50118, 14944, 17952,    81,  1449,     9,  1028, 11597,
            30,     5,  2924,    18,  4225,   669,     7,    63,  6803,   479,
             2],
        [    0,   436,    18,   394,   161,   537,   531,  1032,  3198,     7,
          6008,   479, 50118,  9962,  1901,   606,    15,     5,  1814,   212,
          4038,     9,     5, 12416,  4657,     9,   436,   479, 50118,  1121,
          1824,     6, 24543,     6, 39356,   537,   453,    58, 14459,    13,
          3198,   479,     2,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1],
        [    0, 13505, 18431, 20591,    18,    92, 15860,    16,    66,    11,
          3453,   479, 50118,  2515,    21,  2421,  4168,  3119, 13520,  4765,
           726,  2802,  1020,    11,  5821,  6511,     6,  1179,    11,   188,
           469,   479, 50118,  9690,  2362,    35,    22,   100,  1240,    10,
           205,   233,     9,   127,   301,   546,    13,    41,  3599,    14,
            21,  1522,   113,     2,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1]], device='cuda:1')}, Tensor of size 4x71)
{% endraw %} {% raw %}
dls = dblock.dataloaders(cnndm_df, bs=4)
{% endraw %} {% raw %}
b = dls.one_batch()
{% endraw %} {% raw %}
len(b), b[0]['input_ids'].shape, b[0]['labels'].shape, b[1].shape
(2, torch.Size([4, 1024]), torch.Size([4, 79]), torch.Size([4, 79]))
{% endraw %} {% raw %}
dls.show_batch(dataloaders=dls, max_n=2, input_trunc_at=1000, target_trunc_at=250)
text target
0 (CNN) -- Home to up to 10 percent of all known species, Mexico is recognized as one of the most biodiverse regions on the planet. The twin threats of climate change and human encroachment on natural environments are, however, threatening the existence of the country's rich wildlife. And there is a great deal to lose. In the United Nations Environment Program (UNEP) World Conservation Monitoring Centre's list of megadiverse countries Mexico ranks 11th. The list represents a group of 17 countries that harbor the majority of the Earth's species and are therefore considered extremely biodiverse. From its coral reefs in the Caribbean Sea to its tropical jungles in Chiapas and the Yucatan peninsula and its deserts and prairies in the north, Mexico boasts an incredibly rich variety of flora and fauna. Some 574 out of 717 reptile species found in Mexico -- the most in any country -- can only be encountered within its borders. It is home to 502 types of mammals, 290 species of birds, 1,150 var Mexico hosts to up to 10 percent of all known species on Earth.\nIt is home to 502 types of mammals, 290 bird species and 26,000 types of plants.\nHuman development and climate change is placing a big strain on its biodiversity.\nThe Golden Eagle is un
1 (CNN) -- It's a congested, sprawling transport hub surrounded by 1950s architecture and predominantly used by commuters or tourists to cross the city of Istanbul. But proposed changes to Taksim Square have seen it become the flashpoint for protests that have swept through Turkey in the past week, leaving thousands injured and focusing the world's attention on the government of Prime Minister Recep Tayyip Erdogan. Taksim has been no stranger to violence. In 1977, at least 34 protesters died during May Day clashes with police. May 1 rallies in the square were banned in 1980 and were only allowed to legally resume in 2010. On May Day this year, there were riots after city authorities again refused to grant trade unions and youth groups permission to demonstrate in Taksim, blaming construction work being carried out in the square. Professor Ersin Kalaycioglu, professor of political science at Istanbul's Sabanci University, said significantly, Taksim Square was also known as "republic squa Taksim Square was where Istanbul's water was distributed -- Taksim means divide.\nThe site is seen as symbolizing the seclar Turkish republic founded by Ataturk.\nErdogan's government's plans to alter Taksim's Gezi Park prompted protests.\nThe police's
{% endraw %}

Tests

The purpose of the following tests is to ensure as much as possible, that the core DataBlock code above works for the pretrained summarization models below. These tests are excluded from the CI workflow because of how long they would take to run and the amount of data that would be required to download.

Note: Feel free to modify the code below to test whatever pretrained summarization models you are working with ... and if any of your pretrained summarization models fail, please submit a github issue (or a PR if you'd like to fix it yourself)

{% raw %}
[ model_type for model_type in BLURR.get_models(task='ConditionalGeneration') 
 if (not model_type.__name__.startswith('TF')) ]
[transformers.models.bart.modeling_bart.BartForConditionalGeneration,
 transformers.models.blenderbot.modeling_blenderbot.BlenderbotForConditionalGeneration,
 transformers.models.blenderbot_small.modeling_blenderbot_small.BlenderbotSmallForConditionalGeneration,
 transformers.models.fsmt.modeling_fsmt.FSMTForConditionalGeneration,
 transformers.models.led.modeling_led.LEDForConditionalGeneration,
 transformers.models.m2m_100.modeling_m2m_100.M2M100ForConditionalGeneration,
 transformers.models.mbart.modeling_mbart.MBartForConditionalGeneration,
 transformers.models.mt5.modeling_mt5.MT5ForConditionalGeneration,
 transformers.models.pegasus.modeling_pegasus.PegasusForConditionalGeneration,
 transformers.models.prophetnet.modeling_prophetnet.ProphetNetForConditionalGeneration,
 transformers.models.speech_to_text.modeling_speech_to_text.Speech2TextForConditionalGeneration,
 transformers.models.t5.modeling_t5.T5ForConditionalGeneration,
 transformers.models.xlm_prophetnet.modeling_xlm_prophetnet.XLMProphetNetForConditionalGeneration]
{% endraw %} {% raw %}
pretrained_model_names = [
    'facebook/bart-base',
    'facebook/blenderbot_small-90M',
    'allenai/led-base-16384',
    'google/mt5-small',
    'google/pegasus-cnn_dailymail',
    't5-small', 
    'microsoft/prophetnet-large-uncased',
    'microsoft/xprophetnet-large-wiki100-cased', # XLMProphetNet
]
{% endraw %} {% raw %}
path = Path('./')
cnndm_df = pd.read_csv(path/'cnndm_sample.csv')
{% endraw %} {% raw %}
#hide_output
model_cls = AutoModelForSeq2SeqLM
bsz = 2
seq_sz = 256
trg_seq_sz = 40

test_results = []
for model_name in pretrained_model_names:
    error=None
    
    print(f'=== {model_name} ===\n')
    
    hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(model_name, model_cls=model_cls)
    print(f'architecture:\t{hf_arch}\ntokenizer:\t{type(hf_tokenizer).__name__}\n')
    
    # not all architectures include a native pad_token (e.g., gpt2, ctrl, etc...), so we add one here
    if (hf_tokenizer.pad_token is None): 
        hf_tokenizer.add_special_tokens({'pad_token': '<pad>'})  
        hf_config.pad_token_id = hf_tokenizer.get_vocab()['<pad>']
        hf_model.resize_token_embeddings(len(hf_tokenizer))   
    
    before_batch_tfm = HF_Seq2SeqBeforeBatchTransform(hf_arch, hf_config, hf_tokenizer, hf_model,
                                                      padding='max_length', 
                                                      max_length=seq_sz, 
                                                      max_target_length=trg_seq_sz)
    
    def add_t5_prefix(inp): return f'summarize: {inp}' if (hf_arch == 't5') else inp
    
    blocks = (HF_Seq2SeqBlock(before_batch_tfm=before_batch_tfm), noop)
    dblock = DataBlock(blocks=blocks, 
                   get_x=Pipeline([ColReader('article'), add_t5_prefix]), 
                   get_y=ColReader('highlights'), 
                   splitter=RandomSplitter())

    dls = dblock.dataloaders(cnndm_df, bs=bsz) 
    b = dls.one_batch()

    try:
        print('*** TESTING DataLoaders ***\n')
        test_eq(len(b), 2)
        test_eq(len(b[0]['input_ids']), bsz)
        test_eq(b[0]['input_ids'].shape, torch.Size([bsz, seq_sz]))
        test_eq(len(b[1]), bsz)
        test_eq(b[1].shape, torch.Size([bsz, trg_seq_sz]))

        if (hasattr(hf_tokenizer, 'add_prefix_space') and hf_arch not in ['led']):
             test_eq(hf_tokenizer.add_prefix_space, True)
            
        test_results.append((hf_arch, type(hf_tokenizer).__name__, model_name, 'PASSED', ''))
        dls.show_batch(dataloaders=dls, max_n=2, input_trunc_at=1000)
        
    except Exception as err:
        test_results.append((hf_arch, type(hf_tokenizer).__name__, model_name, 'FAILED', err))
{% endraw %} {% raw %}
arch tokenizer model_name result error
0 bart BartTokenizerFast facebook/bart-base PASSED
1 blenderbot_small BlenderbotSmallTokenizer facebook/blenderbot_small-90M PASSED
2 led LEDTokenizerFast allenai/led-base-16384 PASSED
3 mt5 T5TokenizerFast google/mt5-small PASSED
4 pegasus PegasusTokenizerFast google/pegasus-cnn_dailymail PASSED
5 t5 T5TokenizerFast t5-small PASSED
6 prophetnet ProphetNetTokenizer microsoft/prophetnet-large-uncased PASSED
7 xlm_prophetnet XLMProphetNetTokenizer microsoft/xprophetnet-large-wiki100-cased PASSED
{% endraw %}

Cleanup