nlp_architect.models.gnmt.utils package¶
Submodules¶
nlp_architect.models.gnmt.utils.evaluation_utils module¶
Utility for evaluating various tasks, e.g., translation & summarization.
nlp_architect.models.gnmt.utils.iterator_utils module¶
For loading data into NMT models.
-
class
nlp_architect.models.gnmt.utils.iterator_utils.
BatchedInput
[source]¶ Bases:
nlp_architect.models.gnmt.utils.iterator_utils.BatchedInput
-
nlp_architect.models.gnmt.utils.iterator_utils.
get_iterator
(src_dataset, tgt_dataset, src_vocab_table, tgt_vocab_table, batch_size, sos, eos, random_seed, num_buckets, src_max_len=None, tgt_max_len=None, num_parallel_calls=4, output_buffer_size=None, skip_count=None, num_shards=1, shard_index=0, reshuffle_each_iteration=True, use_char_encode=False)[source]¶
nlp_architect.models.gnmt.utils.misc_utils module¶
Generally useful utility functions.
-
nlp_architect.models.gnmt.utils.misc_utils.
add_summary
(summary_writer, global_step, tag, value)[source]¶ Add a new summary to the current summary_writer. Useful to log things that are not part of the training graph, e.g., tag=BLEU.
-
nlp_architect.models.gnmt.utils.misc_utils.
debug_tensor
(s, msg=None, summarize=10)[source]¶ Print the shape and value of a tensor at test time. Return a new tensor.
-
nlp_architect.models.gnmt.utils.misc_utils.
format_bpe_text
(symbols, delimiter=b'@@')[source]¶ Convert a sequence of bpe words into sentence.
-
nlp_architect.models.gnmt.utils.misc_utils.
format_spm_text
(symbols)[source]¶ Decode a text in SPM (https://github.com/google/sentencepiece) format.
-
nlp_architect.models.gnmt.utils.misc_utils.
format_text
(words)[source]¶ Convert a sequence words into sentence.
-
nlp_architect.models.gnmt.utils.misc_utils.
get_config_proto
(log_device_placement=False, allow_soft_placement=True, num_intra_threads=0, num_inter_threads=0)[source]¶
-
nlp_architect.models.gnmt.utils.misc_utils.
load_hparams
(model_dir)[source]¶ Load hparams from an existing model directory.
-
nlp_architect.models.gnmt.utils.misc_utils.
maybe_parse_standard_hparams
(hparams, hparams_path)[source]¶ Override hparams values with existing standard hparams config.
-
nlp_architect.models.gnmt.utils.misc_utils.
print_hparams
(hparams, skip_patterns=None, header=None)[source]¶ Print hparams, can skip keys based on pattern.
-
nlp_architect.models.gnmt.utils.misc_utils.
print_out
(s, f=None, new_line=True)[source]¶ Similar to print but with support to flush and output to a file.
-
nlp_architect.models.gnmt.utils.misc_utils.
print_time
(s, start_time)[source]¶ Take a start time, print elapsed duration, and return a new time.
nlp_architect.models.gnmt.utils.nmt_utils module¶
Utility functions specifically for NMT.
nlp_architect.models.gnmt.utils.standard_hparams_utils module¶
standard hparams utils.
nlp_architect.models.gnmt.utils.vocab_utils module¶
Utility to handle vocabularies.
-
nlp_architect.models.gnmt.utils.vocab_utils.
check_vocab
(vocab_file, out_dir, check_special_token=True, sos=None, eos=None, unk=None)[source]¶ Check if vocab_file doesn’t exist, create from corpus_file.
-
nlp_architect.models.gnmt.utils.vocab_utils.
create_vocab_tables
(src_vocab_file, tgt_vocab_file, share_vocab)[source]¶ Creates vocab tables for src_vocab_file and tgt_vocab_file.
-
nlp_architect.models.gnmt.utils.vocab_utils.
load_embed_txt
(embed_file)[source]¶ Load embed_file into a python dictionary.
Note: the embed_file should be a Glove/word2vec formatted txt file. Assuming Here is an exampe assuming embed_size=5:
the -0.071549 0.093459 0.023738 -0.090339 0.056123 to 0.57346 0.5417 -0.23477 -0.3624 0.4037 and 0.20327 0.47348 0.050877 0.002103 0.060547
For word2vec format, the first line will be: <num_words> <emb_size>.
Parameters: embed_file – file path to the embedding file. Returns: a dictionary that maps word to vector, and the size of embedding dimensions.