Openspeech

GETTING STARTED

  • Why should I use openspeech?
  • Why shouldn’t I use openspeech?
  • Model architectures
  • Get Started
  • Installation
  • Troubleshoots and Contributing
  • Citation
  • Openspeech’s Hydra configuration
  • Openspeech’s configurations

MODEL ARCHITECTURES

  • Openspeech Model
  • Conformer Model
  • Conformer LSTM Model
  • Conformer Transducer Model
  • Deep CNN with Joint CTC LAS Model
  • DeepSpeech2 Model
  • Jasper5x3 Model
  • Jasper10x5 Model
  • Joint CTC Conformer LSTM Model
  • Joint CTC LAS Model
  • Joint CTC Transformer Model
  • Listen Attend Spell Model
  • Listen Attend Spell (location-aware) Model
  • Listen Attend Spell (multi-head) Model
  • QuartzNet 5x5 Model
  • QuartzNet 10x5 Model
  • QuartzNet 15x5 Model
  • RNN Transducer Model
  • Transformer Model
  • Transformer Transducer Model
  • Transformer with CTC Model
  • VGG Transformer Model

LIBRARY REFERENCE

  • Audio
  • Criterion
  • Data
  • Datasets
  • Decoders
  • Encoders
  • Modules
  • Optim
  • Search
    • Base
    • Beam Search CTC
    • Beam Search LSTM
    • Beam Search Transformer
  • Vocabulary
  • Metric
Openspeech
  • »
  • Search
  • View page source

Search¶

Base¶

class openspeech.search.base.OpenspeechBeamSearchBase(decoder, beam_size: int, batch_size: int)[source]¶

Beam Search CTC¶

class openspeech.search.beam_search_ctc.BeamSearchCTC(labels: list, lm_path: str = None, alpha: int = 0, beta: int = 0, cutoff_top_n: int = 40, cutoff_prob: float = 1.0, beam_size: int = 3, num_processes: int = 4, blank_id: int = 0)[source]¶

Decodes probability output using ctcdecode package.

Parameters
  • labels (list) – the tokens you used to train your model

  • lm_path (str) – the path to your external kenlm language model(LM).

  • alpha (int) – weighting associated with the LMs probabilities.

  • beta (int) – weight associated with the number of words within our beam

  • cutoff_top_n (int) – cutoff number in pruning. Only the top cutoff_top_n characters with the highest probability in the vocab will be used in beam search.

  • cutoff_prob (float) – cutoff probability in pruning. 1.0 means no pruning.

  • beam_size (int) – this controls how broad the beam search is.

  • num_processes (int) – parallelize the batch using num_processes workers.

  • blank_id (int) – this should be the index of the CTC blank token

Inputs:
predicted_probs: Tensor of character probabilities, where probs[c,t] is the probability of

character c at time t

sizes: Size of each sequence in the mini-batch

Returns

sequences of the model’s best prediction

Return type

outputs

forward(logits, sizes=None)[source]¶

Decodes probability output using ctcdecode package.

Inputs:
logits: Tensor of character probabilities, where probs[c,t] is the probability of

character c at time t

sizes: Size of each sequence in the mini-batch

Returns

sequences of the model’s best prediction

Return type

outputs

Beam Search LSTM¶

class openspeech.search.beam_search_lstm.BeamSearchLSTM(decoder: openspeech.decoders.lstm_decoder.LSTMDecoder, beam_size: int, batch_size: int)[source]¶

LSTM Beam Search Decoder

Args: decoder, beam_size, batch_size

decoder (DecoderLSTM): base decoder of lstm model. beam_size (int): size of beam. batch_size (int): size of batch.

Inputs: encoder_outputs, targets, encoder_output_lengths, teacher_forcing_ratio
encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size

(batch, seq_length, dimension)

targets (torch.LongTensor): A target sequence passed to decoders. IntTensor of size

(batch, seq_length)

encoder_output_lengths (torch.LongTensor): A encoder output lengths sequence. LongTensor of size

(batch)

teacher_forcing_ratio (float): Ratio of teacher forcing.

Returns

Log probability of model predictions.

Return type

  • logits (torch.FloatTensor)

forward(encoder_outputs: torch.Tensor, encoder_output_lengths: torch.Tensor) → torch.Tensor[source]¶

Beam search decoding.

Inputs: encoder_outputs
encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size

(batch, seq_length, dimension)

Returns

Log probability of model predictions.

Return type

  • logits (torch.FloatTensor)

Beam Search Transformer¶

class openspeech.search.beam_search_transformer.BeamSearchTransformer(decoder: openspeech.decoders.transformer_decoder.TransformerDecoder, batch_size: int, beam_size: int = 3)[source]¶
Next Previous

© Copyright 2021, Kim, Soohwan and Ha, Sangchun and Cho, Soyoung.

Built with Sphinx using a theme provided by Read the Docs.