nlp_architect.solutions.set_expansion package¶
Subpackages¶
Submodules¶
nlp_architect.solutions.set_expansion.expand_server module¶
nlp_architect.solutions.set_expansion.prepare_data module¶
Script that prepares the input corpus for np2vec training: it runs NP extractor on the corpus and marks extracted NP’s.
-
nlp_architect.solutions.set_expansion.prepare_data.
extract_noun_phrases
(docs, nlp_parser, chunker)[source]¶
-
nlp_architect.solutions.set_expansion.prepare_data.
get_group_norm
(spacy_span)[source]¶ Give a span, determine the its group and return the normalized text representing the group
Parameters: spacy_span (spacy.tokens.Span) –
nlp_architect.solutions.set_expansion.set_expand module¶
-
class
nlp_architect.solutions.set_expansion.set_expand.
SetExpand
(np2vec_model_file, binary=False, word_ngrams=False, grouping=False, light_grouping=False, grouping_map_dir=None)[source]¶ Bases:
object
Set expansion module, given a trained np2vec model.
-
expand
(seed, topn=500)[source]¶ Given a seed of terms, return the expanded set of terms.
Parameters: - seed – seed terms
- topn – maximal number of expanded terms to return
Returns: up to topn expanded terms and their probabilities
-
seed2term_similarity
(seed_id, term_id)[source]¶ Compute cosine similarity between a seed terms and a term. :param seed_id: seed term id’s :param term_id: the term id
Returns: Similarity between the seed terms and the term
-