Speech Sample#

Sample class to handle content for Google Cloud Speech API.

class google.cloud.speech.sample.Sample(content=None, source_uri=None, stream=None, encoding=None, sample_rate_hertz=None, client=None)[source]#

Bases: object

Representation of an audio sample to be used with Google Speech API.

Parameters:
  • content (bytes) – (Optional) Bytes containing audio data.
  • source_uri (str) – (Optional) URI that points to a file that contains audio data bytes as specified in RecognitionConfig. Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: gs://bucket_name/object_name.
  • stream (file) – (Optional) File like object to stream.
  • encoding (str) – encoding of audio data sent in all RecognitionAudio messages, can be one of: LINEAR16, FLAC, MULAW, AMR, AMR_WB
  • sample_rate_hertz (int) – Sample rate in Hertz of the audio data sent in all requests. Valid values are: 8000-48000. For best results, set the sampling rate of the audio source to 16000 Hz. If that’s not possible, use the native sample rate of the audio source (instead of re-sampling).
  • client (Client) – (Optional) The client that owns this instance of sample.
chunk_size#

Chunk size to send over gRPC. ~100ms

Return type:int
Returns:Optimized chunk size.
content#

Bytes of audio content.

Return type:bytes
Returns:Byte stream of audio content.
default_encoding = 'FLAC'#
encoding#

Audio encoding type

Return type:str
Returns:String value of Encoding type.
long_running_recognize(language_code, max_alternatives=None, profanity_filter=None, speech_contexts=())[source]#

Asychronous Recognize request to Google Speech API.

See long_running_recognize.

Parameters:
  • language_code (str) – (Optional) The language of the supplied audio as BCP-47 language tag. Example: 'en-US'.
  • max_alternatives (int) – (Optional) Maximum number of recognition hypotheses to be returned. The server may return fewer than maxAlternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of 1. Defaults to 1
  • profanity_filter (bool) – If True, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. 'f***'. If False or omitted, profanities won’t be filtered out.
  • speech_contexts (list) – A list of strings (max 50) containing words and phrases “hints” so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases. This can also be used to add new words to the vocabulary of the recognizer.
Return type:

Operation

Returns:

Operation for asynchronous request to Google Speech API.

recognize(language_code, max_alternatives=None, profanity_filter=None, speech_contexts=())[source]#

Synchronous Speech Recognition.

See recognize.

Parameters:
  • language_code (str) – The language of the supplied audio as BCP-47 language tag. Example: 'en-US'.
  • max_alternatives (int) – (Optional) Maximum number of recognition hypotheses to be returned. The server may return fewer than maxAlternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of 1. Defaults to 1
  • profanity_filter (bool) – If True, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. 'f***'. If False or omitted, profanities won’t be filtered out.
  • speech_contexts (list) – A list of strings (max 50) containing words and phrases “hints” so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases. This can also be used to add new words to the vocabulary of the recognizer.
Return type:

list

Returns:

A list of dictionaries. One dict for each alternative. Each dictionary typically contains two keys (though not all will be present in all cases)

  • transcript: The detected text from the audio recording.
  • confidence: The confidence in language detection, float between 0 and 1.

sample_rate_hertz#

Sample rate integer.

Return type:int
Returns:Integer between 8000 and 48,000.
source_uri#

Google Cloud Storage URI of audio source.

Return type:str
Returns:Google Cloud Storage URI string.
stream#

Stream the content when it is a file-like object.

Return type:file
Returns:File like object to stream.
streaming_recognize(language_code, max_alternatives=None, profanity_filter=None, speech_contexts=(), single_utterance=False, interim_results=False)[source]#

Streaming speech recognition.

Note

Streaming recognition requests are limited to 1 minute of audio. See: https://cloud.google.com/speech/limits#content

Yields: Instance of
StreamingSpeechResult containing results and metadata from the streaming request.
Parameters:
  • language_code (str) – The language of the supplied audio as BCP-47 language tag. Example: 'en-US'.
  • max_alternatives (int) – (Optional) Maximum number of recognition hypotheses to be returned. The server may return fewer than maxAlternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of 1. Defaults to 1
  • profanity_filter (bool) – If True, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. 'f***'. If False or omitted, profanities won’t be filtered out.
  • speech_contexts (list) – A list of strings (max 50) containing words and phrases “hints” so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases. This can also be used to add new words to the vocabulary of the recognizer.
  • single_utterance (bool) – (Optional) If false or omitted, the recognizer will perform continuous recognition (continuing to process audio even if the user pauses speaking) until the client closes the output stream (gRPC API) or when the maximum time limit has been reached. Multiple SpeechRecognitionResults with the is_final flag set to true may be returned. If true, the recognizer will detect a single spoken utterance. When it detects that the user has paused or stopped speaking, it will return an END_OF_UTTERANCE event and cease recognition. It will return no more than one SpeechRecognitionResult with the is_final flag set to true.
  • interim_results (bool) – (Optional) If true, interim results (tentative hypotheses) may be returned as they become available (these interim results are indicated with the is_final=False flag). If false or omitted, only is_final=true result(s) are returned.
Raises:

EnvironmentError if gRPC is not available.