Data¶
Dataset¶
-
class
openspeech.data.dataset.
SpeechToTextDataset
(configs: omegaconf.dictconfig.DictConfig, dataset_path: str, audio_paths: list, transcripts: list, sos_id: int = 1, eos_id: int = 2, del_silence: bool = False, apply_spec_augment: bool = False, apply_noise_augment: bool = False, apply_time_stretch_augment: bool = False, apply_joining_augment: bool = False)[source]¶ Dataset for audio & transcript matching
Note
Do not use this class directly, use one of the sub classes.
- Parameters
dataset_path (str) – path of librispeech dataset
audio_paths (list) – list of audio path
transcripts (list) – list of transript
sos_id (int) – identification of <|startofsentence|>
eos_id (int) – identification of <|endofsentence|>
del_silence (bool) – flag indication whether to apply delete silence or not
apply_spec_augment (bool) – flag indication whether to apply spec augment or not
apply_noise_augment (bool) – flag indication whether to apply noise augment or not
apply_time_stretch_augment (bool) – flag indication whether to apply time stretch augment or not
apply_joining_augment (bool) – flag indication whether to apply audio joining augment or not
Data Loader¶
-
class
openspeech.data.data_loader.
AudioDataLoader
(dataset: torch.utils.data.dataset.Dataset, num_workers: int, batch_sampler: torch.utils.data.sampler.Sampler, **kwargs)[source]¶ Audio Data Loader
- Parameters
dataset (torch.utils.data.Dataset) – dataset from which to load the data.
num_workers (int) – how many subprocesses to use for data loading.
batch_sampler (torch.utils.data.sampler.Sampler) – defines the strategy to draw samples from the dataset.
-
class
openspeech.data.data_loader.
BucketingSampler
(data_source, batch_size: int = 32, drop_last: bool = False)[source]¶ Samples batches assuming they are in order of size to batch similarly sized samples together.
- Parameters
data_source (torch.utils.data.Dataset) – dataset to sample from
batch_size (int) – size of batch
drop_last (bool) – flat indication whether to drop last batch or not
Spectrogram Feature Transform¶
-
class
openspeech.data.audio.spectrogram.spectrogram.
SpectrogramFeatureTransform
(configs: omegaconf.dictconfig.DictConfig)[source]¶ Create a spectrogram from a audio signal.
- Configurations:
name (str): name of feature transform. (default: spectrogram) sample_rate (int): sampling rate of audio (default: 16000) frame_length (float): frame length for spectrogram (default: 20.0) frame_shift (float): length of hop between STFT (default: 10.0) del_silence (bool): flag indication whether to apply delete silence or not (default: False) num_mels (int): the number of mfc coefficients to retain. (default: 161)
- Parameters
configs (DictConfig) – configuraion set
- Returns
A spectrogram feature. The shape is
(seq_length, num_mels)
- Return type
Tensor
Spectrogram Feature Transform Configuration¶
-
class
openspeech.data.audio.spectrogram.configuration.
SpectrogramConfigs
(name: str = 'spectrogram', sample_rate: int = 16000, frame_length: float = 20.0, frame_shift: float = 10.0, del_silence: bool = False, num_mels: int = 161)[source]¶ This is the configuration class to store the configuration of a
SpectrogramTransform
.It is used to initiated an SpectrogramTransform feature transform.
Configuration objects inherit from :class: ~openspeech.dataclass.OpenspeechDataclass.
- Configurations:
name (str): name of feature transform. (default: spectrogram) sample_rate (int): sampling rate of audio (default: 16000) frame_length (float): frame length for spectrogram (default: 20.0) frame_shift (float): length of hop between STFT (default: 10.0) del_silence (bool): flag indication whether to apply delete silence or not (default: False) num_mels (int): the number of mfc coefficients to retain. (default: 161)
Mel-Spectrogram Feature Transform¶
-
class
openspeech.data.audio.melspectrogram.melspectrogram.
MelSpectrogramFeatureTransform
(configs: omegaconf.dictconfig.DictConfig)[source]¶ Create MelSpectrogram for a raw audio signal. This is a composition of Spectrogram and MelScale.
- Configurations:
name (str): name of feature transform. (default: melspectrogram) sample_rate (int): sampling rate of audio (default: 16000) frame_length (float): frame length for spectrogram (default: 20.0) frame_shift (float): length of hop between STFT (default: 10.0) del_silence (bool): flag indication whether to apply delete silence or not (default: False) num_mels (int): the number of mfc coefficients to retain. (default: 80)
- Parameters
configs (DictConfig) – configuraion set
- Returns
A mel-spectrogram feature. The shape is
(seq_length, num_mels)
- Return type
Tensor
Mel-Spectrogram Feature Transform Configuration¶
-
class
openspeech.data.audio.melspectrogram.configuration.
MelSpectrogramConfigs
(name: str = 'melspectrogram', sample_rate: int = 16000, frame_length: float = 20.0, frame_shift: float = 10.0, del_silence: bool = False, num_mels: int = 80)[source]¶ This is the configuration class to store the configuration of a
MelSpectrogramFeatureTransform
.It is used to initiated an MelSpectrogramFeatureTransform feature transform.
Configuration objects inherit from :class: ~openspeech.dataclass.OpenspeechDataclass.
- Configurations:
name (str): name of feature transform. (default: melspectrogram) sample_rate (int): sampling rate of audio (default: 16000) frame_length (float): frame length for spectrogram (default: 20.0) frame_shift (float): length of hop between STFT (default: 10.0) del_silence (bool): flag indication whether to apply delete silence or not (default: False) num_mels (int): the number of mfc coefficients to retain. (default: 80)