nlp_architect.models package¶
Subpackages¶
- nlp_architect.models.absa package
- Subpackages
- nlp_architect.models.absa.inference package
- nlp_architect.models.absa.train package
- Submodules
- nlp_architect.models.absa.train.acquire_terms module
- nlp_architect.models.absa.train.data_types module
- nlp_architect.models.absa.train.generate_lexicons module
- nlp_architect.models.absa.train.rerank_terms module
- nlp_architect.models.absa.train.rules module
- nlp_architect.models.absa.train.train module
- Module contents
- Submodules
- nlp_architect.models.absa.utils module
- Module contents
- Subpackages
- nlp_architect.models.bist package
- nlp_architect.models.cross_doc_coref package
- Subpackages
- Submodules
- nlp_architect.models.cross_doc_coref.sieves_config module
- nlp_architect.models.cross_doc_coref.sieves_resource module
- Module contents
- nlp_architect.models.gnmt package
- Subpackages
- nlp_architect.models.gnmt.scripts package
- nlp_architect.models.gnmt.utils package
- Submodules
- nlp_architect.models.gnmt.utils.evaluation_utils module
- nlp_architect.models.gnmt.utils.iterator_utils module
- nlp_architect.models.gnmt.utils.misc_utils module
- nlp_architect.models.gnmt.utils.nmt_utils module
- nlp_architect.models.gnmt.utils.standard_hparams_utils module
- nlp_architect.models.gnmt.utils.vocab_utils module
- Module contents
- Submodules
- nlp_architect.models.gnmt.attention_model module
- nlp_architect.models.gnmt.model module
- nlp_architect.models.gnmt.model_helper module
- Module contents
- Subpackages
- nlp_architect.models.transformers package
Submodules¶
nlp_architect.models.bist_parser module¶
-
class
nlp_architect.models.bist_parser.
BISTModel
(activation='tanh', lstm_layers=2, lstm_dims=125, pos_dims=25)[source]¶ Bases:
object
BIST parser model class. This class handles training, prediction, loading and saving of a BIST parser model. After the model is initialized, it accepts a CoNLL formatted dataset as input, and learns to output dependencies for new input.
Parameters: - activation (str, optional) – Activation function to use.
- lstm_layers (int, optional) – Number of LSTM layers to use.
- lstm_dims (int, optional) – Number of LSTM dimensions to use.
- pos_dims (int, optional) – Number of part-of-speech embedding dimensions to use.
-
model
¶ The underlying LSTM model.
Type: MSTParserLSTM
-
params
¶ Additional parameters and resources for the model.
Type: tuple
-
options
¶ User model options.
Type: dict
-
fit
(dataset, epochs=10, dev=None)[source]¶ Trains a BIST model on an annotated dataset in CoNLL file format.
Parameters: - dataset (str) – Path to input dataset for training, formatted in CoNLL/U format.
- epochs (int, optional) – Number of learning iterations.
- dev (str, optional) – Path to development dataset for conducting evaluations.
-
predict
(dataset, evaluate=False)[source]¶ Runs inference with the BIST model on a dataset in CoNLL file format.
Parameters: - dataset (str) – Path to input CoNLL file.
- evaluate (bool, optional) – Write prediction and evaluation files to dataset’s folder.
Returns: The list of input sentences with predicted dependencies attached.
Return type: res (list of list of ConllEntry)
-
predict_conll
(dataset)[source]¶ Runs inference with the BIST model on a dataset in CoNLL object format.
Parameters: dataset (list of list of ConllEntry) – Input in the form of ConllEntry objects. Returns: The list of input sentences with predicted dependencies attached. Return type: res (list of list of ConllEntry)
nlp_architect.models.chunker module¶
-
class
nlp_architect.models.chunker.
SequenceChunker
(use_cudnn=False)[source]¶ Bases:
nlp_architect.models.chunker.SequenceTagger
A sequence Chunker model written in Tensorflow (and Keras) based SequenceTagger model. The model uses only the chunking output of the model.
-
class
nlp_architect.models.chunker.
SequencePOSTagger
(use_cudnn=False)[source]¶ Bases:
nlp_architect.models.chunker.SequenceTagger
A sequence POS tagger model written in Tensorflow (and Keras) based SequenceTagger model. The model uses only the chunking output of the model.
-
class
nlp_architect.models.chunker.
SequenceTagger
(use_cudnn=False)[source]¶ Bases:
object
A sequence tagging model for POS and Chunks written in Tensorflow (and Keras) based on the paper ‘Deep multi-task learning with low level tasks supervised at lower layers’. The model has 3 Bi-LSTM layers and outputs POS and Chunk tags.
Parameters: use_cudnn (bool, optional) – use GPU based model (CUDNNA cells) -
build
(vocabulary_size, num_pos_labels, num_chunk_labels, char_vocab_size=None, max_word_len=25, feature_size=100, dropout=0.5, classifier='softmax', optimizer=None)[source]¶ Build a chunker/POS model
Parameters: - vocabulary_size (int) – the size of the input vocabulary
- num_pos_labels (int) – the size of of POS labels
- num_chunk_labels (int) – the sie of chunk labels
- char_vocab_size (int, optional) – character vocabulary size
- max_word_len (int, optional) – max characters in a word
- feature_size (int, optional) – feature size - determines the embedding/LSTM layer hidden state size
- dropout (float, optional) – dropout rate
- classifier (str, optional) – classifier layer, ‘softmax’ for softmax or ‘crf’ for conditional random fields classifier. default is ‘softmax’.
- optimizer (tensorflow.python.training.optimizer.Optimizer, optional) – optimizer, if None will use default SGD (paper setup)
-
fit
(x, y, batch_size=1, epochs=1, validation_data=None, callbacks=None)[source]¶ Fit provided X and Y on built model
Parameters: - x – x samples
- y – y samples
- batch_size (int, optional) – batch size per sample
- epochs (int, optional) – number of epochs to run before ending training process
- validation_data (optional) – x and y samples to validate at the end of the epoch
- callbacks (optional) – additional callbacks to run with fitting
-
load_embedding_weights
(weights)[source]¶ Load word embedding weights into the model embedding layer
Parameters: weights (numpy.ndarray) – 2D matrix of word weights
-
nlp_architect.models.cross_doc_sieves module¶
-
nlp_architect.models.cross_doc_sieves.
run_entity_coref
(topics: nlp_architect.common.cdc.topics.Topics, resources: nlp_architect.models.cross_doc_coref.system.sieves_container_init.SievesContainerInitialization) → List[nlp_architect.common.cdc.cluster.Clusters][source]¶ Running Cross Document Coref on Entity mentions :param topics: The Topics (with mentions) to evaluate :param resources: (SievesContainerInitialization) resources for running the evaluation
Returns: List of topics and mentions with predicted cross doc coref within each topic Return type: Clusters
-
nlp_architect.models.cross_doc_sieves.
run_event_coref
(topics: nlp_architect.common.cdc.topics.Topics, resources: nlp_architect.models.cross_doc_coref.system.sieves_container_init.SievesContainerInitialization) → List[nlp_architect.common.cdc.cluster.Clusters][source]¶ Running Cross Document Coref on event mentions :param topics: The Topics (with mentions) to evaluate :param resources: resources for running the evaluation
Returns: List of clusters and mentions with predicted cross doc coref within each topic Return type: Clusters
nlp_architect.models.crossling_emb module¶
-
class
nlp_architect.models.crossling_emb.
Generator
(src_ten, tgt_ten, emb_dim, batch_size, smooth_val, lr_ph, beta, vocab_size)[source]¶ Bases:
object
-
class
nlp_architect.models.crossling_emb.
WordTranslator
(hparams, src_vec, tgt_vec, vocab_size)[source]¶ Bases:
object
Main network which does cross-lingual embeddings training
-
apply_procrustes
(sess, final_pairs)[source]¶ Applies procrustes to W matrix for better mapping :param sess: Tensorflow Session :type sess: tf.session :param final_pairs: Array of pairs which are mutual neighbors :type final_pairs: ndarray
-
generate_xling_embed
(sess, src_dict, tgt_dict, tgt_vec)[source]¶ Generates cross lingual embeddings :param sess: Tensorflow session :type sess: tf.session
-
static
report_metrics
(iters, n_words_proc, disc_cost_acc, tic)[source]¶ Reports metrics of how training is going
-
run
(sess, local_lr)[source]¶ Runs whole GAN :param sess: Tensorflow Session :type sess: tf.session :param local_lr: Learning rate :type local_lr: float
-
run_discriminator
(sess, local_lr)[source]¶ Runs discriminator part of GAN :param sess: Tensorflow Session :type sess: tf.session :param local_lr: Learning rate :type local_lr: float
-
run_generator
(sess, local_lr)[source]¶ Runs generator part of GAN :param sess: Tensorflow Session :type sess: tf.session :param local_lr: Learning rate :type local_lr: float
Returns: Returns number of words processed
-
nlp_architect.models.gnmt_model module¶
GNMT attention sequence-to-sequence model with dynamic RNN support.
-
class
nlp_architect.models.gnmt_model.
GNMTModel
(hparams, mode, iterator, source_vocab_table, target_vocab_table, reverse_target_vocab_table=None, scope=None, extra_args=None)[source]¶ Bases:
nlp_architect.models.gnmt.attention_model.AttentionModel
Sequence-to-sequence dynamic model with GNMT attention architecture with sparsity policy support.
nlp_architect.models.intent_extraction module¶
-
class
nlp_architect.models.intent_extraction.
IntentExtractionModel
[source]¶ Bases:
object
Intent Extraction model base class (using tf.keras)
-
fit
(x, y, epochs=1, batch_size=1, callbacks=None, validation=None)[source]¶ Train a model given input samples and target labels.
Parameters: - x – input samples
- y – input sample labels
- epochs (
int
, optional) – number of epochs to train - batch_size (
int
, optional) – batch size - callbacks (
Callback
, optional) – Keras compatible callbacks - validation (
list
ofnumpy.ndarray
, optional) – optional validation data to be evaluated when training
-
input_shape
¶ Get input shape
Type: tuple
-
load_embedding_weights
(weights)[source]¶ Load word embedding weights into the model embedding layer
Parameters: weights (numpy.ndarray) – 2D matrix of word weights
-
-
class
nlp_architect.models.intent_extraction.
MultiTaskIntentModel
(use_cudnn=False)[source]¶ Bases:
nlp_architect.models.intent_extraction.IntentExtractionModel
Multi-Task Intent and Slot tagging model (using tf.keras)
Parameters: use_cudnn (bool, optional) – use GPU based model (CUDNNA cells) -
build
(word_length, num_labels, num_intent_labels, word_vocab_size, char_vocab_size, word_emb_dims=100, char_emb_dims=30, char_lstm_dims=30, tagger_lstm_dims=100, dropout=0.2)[source]¶ Build a model
Parameters: - word_length (int) – max word length (in characters)
- num_labels (int) – number of slot labels
- num_intent_labels (int) – number of intent classes
- word_vocab_size (int) – word vocabulary size
- char_vocab_size (int) – character vocabulary size
- word_emb_dims (int, optional) – word embedding dimensions
- char_emb_dims (int, optional) – character embedding dimensions
- char_lstm_dims (int, optional) – character feature LSTM hidden size
- tagger_lstm_dims (int, optional) – tagger LSTM hidden size
- dropout (float, optional) – dropout rate
-
-
class
nlp_architect.models.intent_extraction.
Seq2SeqIntentModel
[source]¶ Bases:
nlp_architect.models.intent_extraction.IntentExtractionModel
Encoder Decoder Deep LSTM Tagger Model (using tf.keras)
-
build
(vocab_size, tag_labels, token_emb_size=100, encoder_depth=1, decoder_depth=1, lstm_hidden_size=100, encoder_dropout=0.5, decoder_dropout=0.5)[source]¶ Build the model
Parameters: - vocab_size (int) – vocabulary size
- tag_labels (int) – number of tag labels
- token_emb_size (int, optional) – token embedding vector size
- encoder_depth (int, optional) – number of encoder LSTM layers
- decoder_depth (int, optional) – number of decoder LSTM layers
- lstm_hidden_size (int, optional) – LSTM layers hidden size
- encoder_dropout (float, optional) – encoder dropout
- decoder_dropout (float, optional) – decoder dropout
-
nlp_architect.models.matchlstm_ansptr module¶
-
class
nlp_architect.models.matchlstm_ansptr.
MatchLSTMAnswerPointer
(params_dict, embeddings)[source]¶ Bases:
object
Defines end to end MatchLSTM and Answer_Pointer network for Reading Comprehension
-
answer_pointer_pass
()[source]¶ Function to run the answer pointer pass:
Parameters: None – Returns: List of logits for start and end indices of the answer
-
cal_f1_score
(ground_truths, predictions)[source]¶ Function to calculate F-1 and EM scores
Parameters: - ground_truths – labels given in the dataset
- predictions – logits predicted by the network
Returns: F1 score and Exact-Match score
-
get_dynamic_feed_params
(question_str, vocab_reverse)[source]¶ Function to get required feed_dict format for user entered questions. Used mainly in the demo mode.
Parameters: - question_str – question string
- vocab_reverse – vocab dictionary with words as keys and indices as values
Returns: list of indicies represnting the question padded to max length question_len: actual length of the question ques_mask: mask for question_idx
Return type: question_idx
-
inference_mode
(session, valid, vocab_tuple, num_examples, dropout=1.0, dynamic_question_mode=False, dynamic_usr_question='', dynamic_question_index=0)[source]¶ Function to run inference_mode for reading comprehension
Parameters: - session – tensorflow session
- valid – data dictionary for validation set
- vocab_tuple – a tuple containing voacab dictionaries in forward and reverse directions
- num_examples – specify the number of samples to run for inference
- dropout – Float value which is always 1.0 for inference
- dynamic_question_mode – boolean to enable whether or not accept questions from the user(used in the demo mode)
-
static
obtain_indices
(preds_start, preds_end)[source]¶ Function to get answer indices given the predictions
Parameters: - preds_start – predicted start indices
- predictions – predicted end indices
Returns: final start and end indices for the answer
-
nlp_architect.models.memn2n_dialogue module¶
-
class
nlp_architect.models.memn2n_dialogue.
MemN2N_Dialog
(batch_size, vocab_size, sentence_size, memory_size, embedding_size, num_cands, max_cand_len, hops=3, max_grad_norm=40.0, nonlin=None, initializer=<tensorflow.python.ops.init_ops.RandomNormal object>, optimizer=<tensorflow.python.training.adam.AdamOptimizer object>, session=<tensorflow.python.client.session.Session object>, name='MemN2N_Dialog')[source]¶ Bases:
object
End-To-End Memory Network.
-
batch_fit
(stories, queries, answers, cands)[source]¶ Runs the training algorithm over the passed batch
Parameters: - stories – Tensor (None, memory_size, sentence_size)
- queries – Tensor (None, sentence_size)
- answers – Tensor (None, vocab_size)
Returns: floating-point number, the loss computed for the batch
Return type: loss
-
nlp_architect.models.most_common_word_sense module¶
nlp_architect.models.ner_crf module¶
-
class
nlp_architect.models.ner_crf.
NERCRF
(use_cudnn=False)[source]¶ Bases:
object
Bi-LSTM NER model with CRF classification layer (tf.keras model)
Parameters: use_cudnn (bool, optional) – use cudnn LSTM cells -
build
(word_length, target_label_dims, word_vocab_size, char_vocab_size, word_embedding_dims=100, char_embedding_dims=16, tagger_lstm_dims=200, dropout=0.5)[source]¶ Build a NERCRF model
Parameters: - word_length (int) – max word length in characters
- target_label_dims (int) – number of entity labels (for classification)
- word_vocab_size (int) – word vocabulary size
- char_vocab_size (int) – character vocabulary size
- word_embedding_dims (int) – word embedding dimensions
- char_embedding_dims (int) – character embedding dimensions
- tagger_lstm_dims (int) – word tagger LSTM output dimensions
- dropout (float) – dropout rate
-
fit
(x, y, epochs=1, batch_size=1, callbacks=None, validation=None)[source]¶ Train a model given input samples and target labels.
Parameters: - x (numpy.ndarray or
numpy.ndarray
) – input samples - y (numpy.ndarray) – input sample labels
- epochs (
int
, optional) – number of epochs to train - batch_size (
int
, optional) – batch size - callbacks (
Callback
, optional) – Keras compatible callbacks - validation (
list
ofnumpy.ndarray
, optional) – optional validation data to be evaluated when training
- x (numpy.ndarray or
-
load_embedding_weights
(weights)[source]¶ Load word embedding weights into the model embedding layer
Parameters: weights (numpy.ndarray) – 2D matrix of word weights
-
nlp_architect.models.np2vec module¶
-
class
nlp_architect.models.np2vec.
NP2vec
(corpus, corpus_format='txt', mark_char='_', word_embedding_type='word2vec', sg=0, size=100, window=10, alpha=0.025, min_alpha=0.0001, min_count=5, sample=1e-05, workers=20, hs=0, negative=25, cbow_mean=1, iterations=15, min_n=3, max_n=6, word_ngrams=1, prune_non_np=True)[source]¶ Bases:
object
Initialize the np2vec model, train it, save it and load it.
-
classmethod
load
(np2vec_model_file, binary=False, word_ngrams=0, word2vec_format=True)[source]¶ Load the np2vec model.
Parameters: - np2vec_model_file (str) – the file containing the np2vec model to load
- binary (bool) – boolean indicating whether the np2vec model to load is in binary format
- word_ngrams (int {1,0}) – If 1, np2vec model to load uses word vectors with subword (
- information. (ngrams)) –
- word2vec_format (bool) – boolean indicating whether the model to load has been stored in
- word2vec format. (original) –
Returns: np2vec model to load
-
save
(np2vec_model_file='np2vec.model', binary=False, word2vec_format=True)[source]¶ Save the np2vec model.
Parameters: - np2vec_model_file (str) – the file containing the np2vec model to load
- binary (bool) – boolean indicating whether the np2vec model to load is in binary format
- word2vec_format (bool) – boolean indicating whether to save the model in original
- format. (word2vec) –
-
classmethod
nlp_architect.models.np_semantic_segmentation module¶
-
class
nlp_architect.models.np_semantic_segmentation.
NpSemanticSegClassifier
(num_epochs, callback_args, loss='binary_crossentropy', optimizer='adam', batch_size=128)[source]¶ Bases:
object
NP Semantic Segmentation classifier model (based on tf.Keras framework).
Parameters: - num_epochs (int) – number of epochs to train the model
- **callback_args (dict) – callback args keyword arguments to init a Callback for the model
- loss – the model’s cost function. Default is ‘tf.keras.losses.binary_crossentropy’ loss
- optimizer (
tf.keras.optimizers
) – the model’s optimizer. Default is ‘adam’
-
build
(input_dim)[source]¶ Build the model’s layers :param input_dim: the first layer’s input_dim :type input_dim: int
-
eval
(test_set)[source]¶ Evaluate the model’s test_set on error_rate, test_accuracy_rate and precision_recall_rate
Parameters: test_set ( numpy.ndarray
) – The test setReturns: loss, binary_accuracy, precision, recall and f1 measures Return type: tuple(float)
-
fit
(train_set)[source]¶ Train and fit the model on the datasets
Parameters: - train_set (
numpy.ndarray
) – The train set - args – callback_args and epochs from ArgParser input
- train_set (
-
get_outputs
(test_set)[source]¶ Classify the dataset on the model
Parameters: test_set ( numpy.ndarray
) – The test setReturns: model’s predictions Return type: list( numpy.ndarray
)
-
nlp_architect.models.np_semantic_segmentation.
f1
(y_true, y_pred)[source]¶ Parameters: - y_true –
- y_pred –
Returns:
nlp_architect.models.supervised_sentiment module¶
-
nlp_architect.models.supervised_sentiment.
one_hot_cnn
(dense_out, max_len=300, frame='small')[source]¶ Temporal CNN Model
As defined in “Text Understanding from Scratch” by Zhang, LeCun 2015 https://arxiv.org/pdf/1502.01710v4.pdf This model is a series of 1D CNNs, with a maxpooling and fully connected layers. The frame sizes may either be large or small.
Parameters: - dense_out (int) – size out the output dense layer, this is the number of classes
- max_len (int) – length of the input text
- frame (str) – frame size, either large or small
Returns: temporal CNN model
Return type: model (model)
-
nlp_architect.models.supervised_sentiment.
simple_lstm
(max_features, dense_out, input_length, embed_dim=256, lstm_out=140, dropout=0.5)[source]¶ Simple Bi-direction LSTM Model in Keras
Single layer bi-directional lstm with recurrent dropout and a fully connected layer
Parameters: - max_features (int) – vocabulary size
- dense_out (int) – size out the output dense layer, this is the number of classes
- input_length (int) – length of the input text
- embed_dim (int) – internal embedding size used in the lstm
- lstm_out (int) – size of the bi-directional output layer
- dropout (float) – value for recurrent dropout, between 0 and 1
Returns: LSTM model
Return type: model (model)
nlp_architect.models.tagging module¶
-
class
nlp_architect.models.tagging.
InputFeatures
(input_ids, char_ids, mask=None, label_id=None)[source]¶ Bases:
object
A single set of features of data.
-
class
nlp_architect.models.tagging.
NeuralTagger
(embedder_model, word_vocab: nlp_architect.utils.text.Vocabulary, labels: List[str] = None, use_crf: bool = False, device: str = 'cpu', n_gpus=0)[source]¶ Bases:
nlp_architect.models.TrainableModel
Simple neural tagging model Supports pytorch embedder models, multi-gpu training, KD from teacher models
Parameters: - embedder_model – pytorch embedder model (valid nn.Module model)
- word_vocab (Vocabulary) – word vocabulary
- labels (List, optional) – list of labels. Defaults to None
- use_crf (bool, optional) – use CRF a the classifier (instead of Softmax). Defaults to False.
- device (str, optional) – device backend. Defatuls to ‘cpu’.
- n_gpus (int, optional) – number of gpus. Default to 0.
-
convert_to_tensors
(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], max_seq_length: int = 128, max_word_length: int = 12, pad_id: int = 0, labels_pad_id: int = 0, include_labels: bool = True) → torch.utils.data.dataset.TensorDataset[source]¶ Convert examples to valid tagger dataset
Parameters: - examples (List[TokenClsInputExample]) – List of examples
- max_seq_length (int, optional) – max words per sentence. Defaults to 128.
- max_word_length (int, optional) – max characters in a word. Defaults to 12.
- pad_id (int, optional) – padding int id. Defaults to 0.
- labels_pad_id (int, optional) – labels padding id. Defaults to 0.
- include_labels (bool, optional) – include labels in dataset. Defaults to True.
Returns: TensorDataset for given examples
Return type: TensorDataset
-
evaluate
(data_set: torch.utils.data.dataloader.DataLoader)[source]¶ Run evaluation on given dataloader
Parameters: data_set (DataLoader) – a data loader to run evaluation on Returns: logits, labels (if labels are given)
-
evaluate_predictions
(logits, label_ids)[source]¶ Evaluate given logits on truth labels
Parameters: - logits – logits of model
- label_ids – truth label ids
Returns: dictionary containing P/R/F1 metrics
Return type: dict
-
get_optimizer
(lr: int = 0.001)[source]¶ Get default optimizer
Parameters: lr (int, optional) – learning rate. Defaults to 0.001. Returns: optimizer Return type: torch.optim.Optimizer
-
inference
(examples: List[nlp_architect.data.sequential_tagging.TokenClsInputExample], batch_size: int = 64)[source]¶ Do inference on given examples
Parameters: - examples (List[TokenClsInputExample]) – examples
- batch_size (int, optional) – batch size. Defaults to 64.
Returns: a list of tuples of tokens, tags predicted by model
Return type: List(tuple)
-
classmethod
load_model
(model_path: str)[source]¶ Load a tagger model from given path
Parameters: - model_path (str) – model path
- NeuralTagger – tagger model loaded from path
-
save_model
(output_dir: str)[source]¶ Save model to path
Parameters: output_dir (str) – output directory
-
to
(device='cpu', n_gpus=0)[source]¶ Put model on given device
Parameters: - device (str, optional) – device backend. Defaults to ‘cpu’.
- n_gpus (int, optional) – number of gpus. Defaults to 0.
-
train
(train_data_set: torch.utils.data.dataloader.DataLoader, dev_data_set: torch.utils.data.dataloader.DataLoader = None, test_data_set: torch.utils.data.dataloader.DataLoader = None, epochs: int = 3, batch_size: int = 8, optimizer=None, max_grad_norm: float = 5.0, logging_steps: int = 50, save_steps: int = 100, save_path: str = None, distiller: nlp_architect.nn.torch.distillation.TeacherStudentDistill = None)[source]¶ Train a tagging model
Parameters: - train_data_set (DataLoader) – train examples dataloader. If distiller object is
- train examples should contain a tuple of student/teacher data examples. (provided) –
- dev_data_set (DataLoader, optional) – dev examples dataloader. Defaults to None.
- test_data_set (DataLoader, optional) – test examples dataloader. Defaults to None.
- epochs (int, optional) – num of epochs to train. Defaults to 3.
- batch_size (int, optional) – batch size. Defaults to 8.
- optimizer ([type], optional) – optimizer. Defaults to default model optimizer.
- max_grad_norm (float, optional) – max gradient norm. Defaults to 5.0.
- logging_steps (int, optional) – number of steps between logging. Defaults to 50.
- save_steps (int, optional) – number of steps between model saves. Defaults to 100.
- save_path (str, optional) – model output path. Defaults to None.
- distiller (TeacherStudentDistill, optional) – KD model for training the model using
- teacher model. Defaults to None. (a) –
nlp_architect.models.temporal_convolutional_network module¶
-
class
nlp_architect.models.temporal_convolutional_network.
CommonLayers
[source]¶ Bases:
object
- Class that contains the common layers for language modeling -
- word embeddings and projection layer
-
define_input_layer
(input_placeholder_tokens, word_embeddings, embeddings_trainable=True)[source]¶ Define the input word embedding layer :param input_placeholder_tokens: tf.placeholder, input to the model :param word_embeddings: numpy array (optional), to initialize the embeddings with :param embeddings_trainable: boolean, whether or not to train the embedding table
Returns: Embeddings corresponding to the data in input placeholder
-
define_projection_layer
(prediction, tied_weights=True)[source]¶ Define the output word embedding layer :param prediction: tf.tensor, the prediction from the model :param tied_weights: boolean, whether or not to tie weights from the input embedding layer
Returns: Probability distribution over vocabulary
-
class
nlp_architect.models.temporal_convolutional_network.
TCN
(max_len, n_features_in, hidden_sizes, kernel_size=7, dropout=0.2)[source]¶ Bases:
object
This class defines core TCN architecture. This is only the base class, training strategy is not implemented.
-
class
nlp_architect.models.temporal_convolutional_network.
WeightNorm
(layer, data_init=False, **kwargs)[source]¶ Bases:
tensorflow.python.keras.layers.wrappers.Wrapper
This wrapper reparameterizes a layer by decoupling the weight’s magnitude and direction. This speeds up convergence by improving the conditioning of the optimization problem.
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks: https://arxiv.org/abs/1602.07868 Tim Salimans, Diederik P. Kingma (2016)
WeightNorm wrapper works for keras and tf layers.
- ```python
- net = WeightNorm(tf.keras.layers.Conv2D(2, 2, activation=’relu’),
- input_shape=(32, 32, 3), data_init=True)(x)
- net = WeightNorm(tf.keras.layers.Conv2D(16, 5, activation=’relu’),
- data_init=True)
- net = WeightNorm(tf.keras.layers.Dense(120, activation=’relu’),
- data_init=True)(net)
- net = WeightNorm(tf.keras.layers.Dense(n_classes),
- data_init=True)(net)
Parameters: - layer – a layer instance.
- data_init – If True use data dependent variable initialization
Raises: ValueError
– If not initialized with a Layer instance.ValueError
– If Layer does not contain a kernel of weightsNotImplementedError
– If data_init is True and running graph execution
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
Parameters: input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer. Returns: An input shape tuple.