Automatic Speech Recognition (ASR)¶
spokestack.asr.spokestack.cloud_client module¶
This module contains the websocket logic used to communicate with Spokestack’s cloud-based ASR service.
-
exception
spokestack.asr.spokestack.cloud_client.
APIError
(response)[source]¶ -
Spokestack api error pass through
- Parameters
-
response (dict) – message from the api service
-
class
spokestack.asr.spokestack.cloud_client.
CloudClient
(key_id, key_secret, socket_url='wss://api.spokestack.io', audio_format='PCM16LE', sample_rate=16000, language='en', limit=10, idle_timeout=None)[source]¶ -
Spokestack client for cloud based speech to text
- Parameters
-
-
key_id (str) – identity from spokestack api credentials
-
key_secret (str) – secret key from spokestack api credentials
-
socket_url (str) – url for socket connection
-
audio_format (str) – format of input audio
-
sample_rate (int) – audio sample rate (kHz)
-
language (str) – language for recognition
-
limit (int) – Limit of messages per api response
-
idle_timeout (Any) – Time before client timeout. Defaults to None
-
-
property
idle_count
¶ -
current counter of idle time
- Return type
-
int
-
property
idle_timeout
¶ -
property for maximum idle time
- Return type
-
Any
-
property
is_connected
¶ -
status of the socket connection
- Return type
-
bool
-
property
is_final
¶ -
status of most recent sever response
- Return type
-
bool
-
property
response
¶ -
current response message
- Return type
-
dict
spokestack.asr.spokestack.speech_recognizer module¶
This module contains the recognizer for cloud based ASR in the speech pipeline
-
class
spokestack.asr.spokestack.speech_recognizer.
CloudSpeechRecognizer
(spokestack_id='', spokestack_secret='', language='en', sample_rate=16000, frame_width=20, idle_timeout=5000, **kwargs)[source]¶ -
Speech recognizer for use in the speech pipeline
- Parameters
-
-
spokestack_id (str) – identity under spokestack api credentials
-
spokestack_secret (str) – secret key from spokestack api credentials
-
language (str) – language recognized
-
sample_rate (int) – audio sample rate (kHz)
-
frame_width (int) – frame width of the audio (ms)
-
idle_timeout (int) – the number of iterations before the connection times out
-
spokestack.asr.google.speech_recognizer module¶
This module contains the google asr speech recognizer
-
class
spokestack.asr.google.speech_recognizer.
GoogleSpeechRecognizer
(language, credentials=None, sample_rate=16000, **kwargs)[source]¶ -
Transforms speech into text using Google’s ASR.
- Parameters
-
-
language (str) – The language of given audio as a [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. Example: “en-US”
-
credentials (Union[None, str, dict]) – Dictionary of Google API credentials or path to credentials. if set to None credentials will be pulled from the environment variable: GOOGLE_APPLICATION_CREDENTIALS
-
sample_rate (int) – sample rate of the input audio (Hz)
-
**kwargs (optional) – additional keyword arguments
-
This module contains the Spokestack KeywordRecognizer which identifies multiple keywords from an audio stream.
-
class
spokestack.asr.keyword.tflite.
KeywordRecognizer
(classes, pre_emphasis=0.97, sample_rate=16000, fft_window_type='hann', fft_hop_length=10, model_dir='', posterior_threshold=0.5, **kwargs)[source]¶ -
Recognizes keywords in an audio stream.
- Parameters
-
-
classes (List[str]) – Keyword labels
-
pre_emphasis (float) – The value of the pre-emphasis filter
-
sample_rate (int) – The number of audio samples per second of audio (kHz)
-
fft_window_type (str) – The type of fft window. (only support for hann)
-
fft_hop_length (int) – Audio sliding window for STFT calculation (ms)
-
model_dir (str) – Path to the directory containing .tflite models
-
posterior_threshold (float) – Probability threshold for detection
-