Openspeech Model

Openspeech Model

class openspeech.models.openspeech_model.OpenspeechModel(configs: omegaconf.dictconfig.DictConfig, vocab: openspeech.vocabs.vocab.Vocabulary)[source]

Super class of openspeech models.

Note

Do not use this class directly, use one of the sub classes.

Parameters
  • configs (DictConfig) – configuration set.

  • vocab (Vocabulary) – the class of vocabulary

Inputs:
inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be a padded

FloatTensor of size (batch, seq_length, dimension).

input_lengths (torch.LongTensor): The length of input tensor. (batch)

Returns

Result of model predictions.

Return type

  • y_hats (torch.FloatTensor)

configure_criterion(criterion_name: str) → torch.nn.modules.module.Module[source]

Configure criterion for training.

Parameters

criterion_name (str) – name of criterion

Returns

criterion for training

Return type

criterion (nn.Module)

configure_optimizers()[source]

Choose what optimizers and learning-rate schedulers to use in your optimization.

Returns

  • Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers

    (or multiple lr_dict).

forward(inputs: torch.FloatTensor, input_lengths: torch.LongTensor) → Dict[str, torch.Tensor][source]

Forward propagate a inputs and targets pair for inference.

Inputs:
inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be a padded

FloatTensor of size (batch, seq_length, dimension).

input_lengths (torch.LongTensor): The length of input tensor. (batch)

Returns

Result of model predictions.

Return type

  • outputs (dict)

log_steps(stage: str, wer: float, cer: float, loss: Optional[float] = None, cross_entropy_loss: Optional[float] = None, ctc_loss: Optional[float] = None)None[source]

Provides log dictionary.

Parameters
  • stage (str) – current stage (train, valid, test)

  • wer (float) – word error rate

  • cer (float) – character error rate

  • loss (float) – loss of model’s prediction

  • cross_entropy_loss (Optional, float) – cross entropy loss of model’s prediction

  • ctc_loss (Optional, float) – ctc loss of model’s prediction

test_step(batch: tuple, batch_idx: int)[source]

Forward propagate a inputs and targets pair for test.

Inputs:

batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

training_step(batch: tuple, batch_idx: int)[source]

Forward propagate a inputs and targets pair for training.

Inputs:

batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

validation_epoch_end(outputs: dict)dict[source]

Called at the end of the validation epoch with the outputs of all validation steps.

# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    val_outs.append(out)
validation_epoch_end(val_outs)
Parameters

outputs – List of outputs you defined in validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.

Returns

None

Note

If you didn’t define a validation_step(), this won’t be called.

Examples

With a single dataloader:

def validation_epoch_end(self, val_step_outputs):
    for out in val_step_outputs:
        # do something

With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader.

def validation_epoch_end(self, outputs):
    for dataloader_output_result in outputs:
        dataloader_outs = dataloader_output_result.dataloader_i_outputs

    self.log('final_metric', final_value)
validation_step(batch: tuple, batch_idx: int)[source]

Forward propagate a inputs and targets pair for validation.

Inputs:

batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

Openspeech Encoder Decoder Model

class openspeech.models.openspeech_encoder_decoder_model.OpenspeechEncoderDecoderModel(configs: omegaconf.dictconfig.DictConfig, vocab: openspeech.vocabs.vocab.Vocabulary)[source]

Base class for OpenSpeech’s encoder-decoder models.

Parameters
  • configs (DictConfig) – configuration set.

  • vocab (Vocabulary) – the class of vocabulary

Inputs:
  • inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be

    a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor): The length of input tensor. (batch)

Returns

Result of model predictions.

Return type

  • y_hats (torch.FloatTensor)

forward(inputs: torch.Tensor, input_lengths: torch.Tensor) → Dict[str, torch.Tensor][source]

Forward propagate a inputs and targets pair for inference.

Inputs:
inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be a padded

FloatTensor of size (batch, seq_length, dimension).

input_lengths (torch.LongTensor): The length of input tensor. (batch)

Returns

Result of model predictions that contains predictions, logits, encoder_outputs,

encoder_logits, encoder_output_lengths.

Return type

  • dict (dict)

test_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for test.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

training_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for training.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

validation_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for validation.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

Openspeech CTC Model

class openspeech.models.openspeech_ctc_model.OpenspeechCTCModel(configs: omegaconf.dictconfig.DictConfig, vocab: openspeech.vocabs.vocab.Vocabulary)[source]

Base class for OpenSpeech’s encoder-only models (ctc-model).

Parameters
  • configs (DictConfig) – configuration set.

  • vocab (Vocabulary) – the class of vocabulary

Inputs:
inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be a padded

FloatTensor of size (batch, seq_length, dimension).

input_lengths (torch.LongTensor): The length of input tensor. (batch)

Returns

Result of model predictions.

Return type

  • y_hats (torch.FloatTensor)

forward(inputs: torch.FloatTensor, input_lengths: torch.IntTensor) → Dict[str, torch.Tensor][source]

Forward propagate a inputs and targets pair for inference.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoders. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.IntTensor) – The length of input tensor. (batch)

Returns

Result of model predictions that contains y_hats, logits, output_lengths

Return type

  • dict (dict)

set_beam_decoder(beam_size: int = 3)[source]

Setting beam search decoder

test_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for test.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

training_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for training.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

validation_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for validation.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

Openspeech Transducer Model

class openspeech.models.openspeech_transducer_model.OpenspeechTransducerModel(configs: omegaconf.dictconfig.DictConfig, vocab: openspeech.vocabs.vocab.Vocabulary)[source]

Base class for OpenSpeech’s transducer models.

Parameters
  • configs (DictConfig) – configuration set.

  • vocab (Vocabulary) – the class of vocabulary

Inputs:
  • inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be

    a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor): The length of input tensor. (batch)

Returns

Result of model predictions.

Return type

  • y_hats (torch.FloatTensor)

decode(encoder_output: torch.Tensor, max_length: int)torch.Tensor[source]

Decode encoder_outputs.

Parameters
  • encoder_output (torch.FloatTensor) – A output sequence of encoders. FloatTensor of size (seq_length, dimension)

  • max_length (int) – max decoding time step

Returns

Log probability of model predictions.

Return type

  • logits (torch.FloatTensor)

forward(inputs: torch.Tensor, input_lengths: torch.Tensor) → Dict[str, torch.Tensor][source]

Decode encoder_outputs.

Parameters
  • inputs (torch.FloatTensor) – A input sequence passed to encoders. Typically for inputs this will be a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor) – The length of input tensor. (batch)

Returns

Result of model predictions that contains predictions, logits,

encoder_outputs, encoder_output_lengths

Return type

  • dict (dict)

joint(encoder_outputs: torch.Tensor, decoder_outputs: torch.Tensor)torch.Tensor[source]

Joint encoder_outputs and decoder_outputs.

Parameters
  • encoder_outputs (torch.FloatTensor) – A output sequence of encoders. FloatTensor of size (batch, seq_length, dimension)

  • decoder_outputs (torch.FloatTensor) – A output sequence of decoders. FloatTensor of size (batch, seq_length, dimension)

Returns

outputs of joint encoder_outputs and decoder_outputs..

Return type

  • outputs (torch.FloatTensor)

test_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for test.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

training_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for training.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)

validation_step(batch: tuple, batch_idx: int)collections.OrderedDict[source]

Forward propagate a inputs and targets pair for validation.

Inputs:

train_batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch

Returns

loss for training

Return type

loss (torch.Tensor)