Named Entity Recognition

Named Entity Recognition related modeling class

class pororo.tasks.named_entity_recognition.PororoNerFactory(task: str, lang: str, model: Optional[str])[source]

Bases: pororo.tasks.utils.base.PororoFactoryBase

Conduct named entity recognition

English (roberta.base.en.ner)

  • dataset: OntoNotes 5.0

  • metric: F1 (91.63)

Korean (charbert.base.ko.ner)

Japanese (jaberta.base.ja.ner)

Chinese (zhberta.base.zh.ner)

  • dataset: OntoNotes 5.0

  • metric: F1 (79.06)

Parameters

sent – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

Examples

>>> ner = Pororo(task="ner")
>>> ner("It was in midfield where Arsenal took control of the game, and that was mainly down to Thomas Partey and Mohamed Elneny.")
[('It', 'O'), ('was', 'O'), ('in', 'O'), ('midfield', 'O'), ('where', 'O'), ('Arsenal', 'ORG'), ('took', 'O'), ('control', 'O'), ('of', 'O'), ('the', 'O'), ('game', 'O'), (',', 'O'), ('and', 'O'), ('that', 'O'), ('was', 'O'), ('mainly', 'O'), ('down', 'O'), ('to', 'O'), ('Thomas Partey', 'PERSON'), ('and', 'O'), ('Mohamed Elneny', 'PERSON'), ('.', 'O')]
>>> ner = Pororo(task="ner", lang="ko")
>>> ner("안녕하세요. 제 이름은 카터입니다.")
[("안녕하세요.", "O"), (" ", "O"), ("제", "O"), ("이름은", "O"), ("카터", "PS"), ("입니다.", "O")]
>>> ner = Pororo(task="ner", lang="zh")
>>> ner("毛泽东(1893年12月26日-1976年9月9日),字润之,湖南湘潭人。中华民国大陆时期、中国共产党和中华人民共和国的重要政治家、经济家、军事家、战略家、外交家和诗人。")
[('毛泽东', 'PERSON'), ('(', 'O'), ('1893年12月26日-1976年9月9日', 'DATE'), (')', 'O'), (',', 'O'), ('字润之', 'O'), (',', 'O'), ('湖南', 'GPE'), ('湘潭', 'GPE'), ('人', 'O'), ('。', 'O'), ('中华民国大陆时期', 'GPE'), ('、', 'O'), ('中国共产党', 'ORG'), ('和', 'O'), ('中华人民共和国', 'GPE'), ('的', 'O'), ('重', 'O'), ('要', 'O'), ('政', 'O'), ('治', 'O'), ('家', 'O'), ('、', 'O'), ('经', 'O'), ('济', 'O'), ('家', 'O'), ('、', 'O'), ('军', 'O'), ('事', 'O'), ('家', 'O'), ('、', 'O'), ('战', 'O'), ('略', 'O'), ('家', 'O'), ('、', 'O'), ('外', 'O'), ('交', 'O'), ('家', 'O'), ('和', 'O'), ('诗', 'O'), ('人', 'O'), ('。', 'O')]
>>> ner = Pororo(task="ner", lang="ja")
>>> ner("豊臣 秀吉、または羽柴 秀吉は、戦国時代から安土桃山時代にかけての武将、大名。天下人、武家関白、太閤。三英傑の一人。")
[('豊臣秀吉', 'PERSON'), ('、', 'O'), ('または', 'O'), ('羽柴秀吉', 'PERSON'), ('は', 'O'), ('、', 'O'), ('戦国時代', 'DATE'), ('から', 'O'), ('安土桃山時代', 'DATE'), ('にかけて', 'O'), ('の', 'O'), ('武将', 'O'), ('、', 'O'), ('大名', 'O'), ('。', 'O'), ('天下', 'O'), ('人', 'O'), ('、', 'O'), ('武家', 'O'), ('関白', 'O'), ('、太閤', 'O'), ('。', 'O'), ('三', 'O'), ('英', 'O'), ('傑', 'O'), ('の', 'O'), ('一', 'O'), ('人', 'O'), ('。', 'O')]
static get_available_langs()[source]
static get_available_models()[source]
load(device)[source]

Load user-selected task-specific model

Parameters

device (str) – device information

Returns

User-selected task-specific model

Return type

object

class pororo.tasks.named_entity_recognition.PororoBertNerEn(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str)[source]

Conduct named entity recognition with english RoBERTa

Parameters

sent – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertCharNer(model, sent_tokenizer, device, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(text: str, ignore_labels: List[str] = [])[source]

Conduct named entity recognition with character BERT

Parameters

text – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertNerZh(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str)[source]

Conduct named entity recognition with Chinese RoBERTa

Parameters

sent – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]

class pororo.tasks.named_entity_recognition.PororoBertNerJa(model, config)[source]

Bases: pororo.tasks.utils.base.PororoSimpleBase

predict(sent: str)[source]

Conduct named entity recognition with Japanese RoBERTa

Parameters

sent – (str) sentence to be sequence labeled

Returns

token and its predicted tag tuple list

Return type

List[Tuple[str, str]]